Introduction

Aim: To pre-process raw data from GEO using RMA and to perform a differential gene expression analysis comparing COPD vs CONTROL group

COPD

Chronic Obstructive Pulmonary Disease (COPD) is characterized by emphysema and chronic bronchitis, it’s diagnosed using spirometry and clinical information which lead to a heterogeneous COPD patients. The ranking of non commutable diseases from the WHO estimates that COPD is in the top of mortality causes and tobacco is the main risk factor but different genetic variants have been associated with this disease.

Different researchers have analyzed COPD transcriptomics using high-throughput data such as microarrays and RNA-seq. We belive it would be relevant to unravel a robust gene expression signature for COPD patients regardless if it is from different experiments or laboratories.

Background

This script is part of an analysis performed with PulmonDB data that looks to determine a common deferentially expressed genes. The analysis was divided into different vignettes computing different steps. In this script out objective is to do a meta-analysis with raw data downloaded from GEO. The results will be used as a reference and to compare them with a meta-analysis computed with PulmonDB sample contrasts.

We want to pre-process and re-analyze transcriptomic experiments from GEO that have:

  • Lung tissue samples

  • COPD vs CONTROL group

Input

This script needs the following files:

Data 1: Table of GSE experiments
Data 2: Raw data (.CEL, .TXT)

Setup

All the data has already been downloaded in the cluster 10.200.0.42, for accessing:

ssh ana@10.200.0.42

cd /home/ana/R-projects/Meta-analysis_COPD

For running the script, type:

nohup R -e "rmarkdown::render('vignettes/RMA_DE-23-June-2020.Rmd')" &

The script can be found in: /home/ana/R-projects/Meta-analysis_COPD/vignettes

setwd("..")
PATH = getwd()
DATA_DIR = file.path(PATH,"data")
OUTPUT_DIR = file.path(PATH,"output_data")
FIG_DIR = file.path(PATH,"fig")
TODAY = Sys.Date()

knitr::knit_hooks$set(timeit = local({
  now = NULL
  function(before, options) {
    if (before) {
      now <<- Sys.time()
    } else {
      runtime = difftime(Sys.time(), now)
      now <<- NULL
      # use options$label if you want the chunk label as well
      paste('Time for this code chunk:', as.character(runtime))
    }
  }})
)

knitr::opts_knit$set(root.dir = PATH)
knitr::opts_chunk$set(echo = TRUE,
                      timeit=TRUE,
                      warning=FALSE,
                      attr.output='style="max-height: 500px;"')

And the analysis is run in: /home/ana/R-projects/Meta-analysis_COPD

Libraries

if (!requireNamespace("BiocManager", quietly = TRUE)) {
      install.packages("BiocManager")
  }

packages <- c("knitr",
          "oligo",
          "tidyverse",
          "limma",
          "SummarizedExperiment",
          "GEOquery",
          "DESeq2",
          "org.Hs.eg.db",
          "AnnotationDbi",
          "recount")

for(l in packages){
  if (!requireNamespace(l, quietly = TRUE)) {
    BiocManager::install(l)}
}

Time for this code chunk: 13.9964439868927

lapply(packages, library, character.only = TRUE)
## [[1]]
## [1] "knitr"     "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [7] "methods"   "base"     
## 
## [[2]]
##  [1] "oligo"        "Biostrings"   "XVector"      "IRanges"      "S4Vectors"   
##  [6] "stats4"       "Biobase"      "oligoClasses" "BiocGenerics" "parallel"    
## [11] "knitr"        "stats"        "graphics"     "grDevices"    "utils"       
## [16] "datasets"     "methods"      "base"        
## 
## [[3]]
##  [1] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
##  [6] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "oligo"       
## [11] "Biostrings"   "XVector"      "IRanges"      "S4Vectors"    "stats4"      
## [16] "Biobase"      "oligoClasses" "BiocGenerics" "parallel"     "knitr"       
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"        
## 
## [[4]]
##  [1] "limma"        "forcats"      "stringr"      "dplyr"        "purrr"       
##  [6] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [11] "oligo"        "Biostrings"   "XVector"      "IRanges"      "S4Vectors"   
## [16] "stats4"       "Biobase"      "oligoClasses" "BiocGenerics" "parallel"    
## [21] "knitr"        "stats"        "graphics"     "grDevices"    "utils"       
## [26] "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "SummarizedExperiment" "DelayedArray"         "matrixStats"         
##  [4] "GenomicRanges"        "GenomeInfoDb"         "limma"               
##  [7] "forcats"              "stringr"              "dplyr"               
## [10] "purrr"                "readr"                "tidyr"               
## [13] "tibble"               "ggplot2"              "tidyverse"           
## [16] "oligo"                "Biostrings"           "XVector"             
## [19] "IRanges"              "S4Vectors"            "stats4"              
## [22] "Biobase"              "oligoClasses"         "BiocGenerics"        
## [25] "parallel"             "knitr"                "stats"               
## [28] "graphics"             "grDevices"            "utils"               
## [31] "datasets"             "methods"              "base"                
## 
## [[6]]
##  [1] "GEOquery"             "SummarizedExperiment" "DelayedArray"        
##  [4] "matrixStats"          "GenomicRanges"        "GenomeInfoDb"        
##  [7] "limma"                "forcats"              "stringr"             
## [10] "dplyr"                "purrr"                "readr"               
## [13] "tidyr"                "tibble"               "ggplot2"             
## [16] "tidyverse"            "oligo"                "Biostrings"          
## [19] "XVector"              "IRanges"              "S4Vectors"           
## [22] "stats4"               "Biobase"              "oligoClasses"        
## [25] "BiocGenerics"         "parallel"             "knitr"               
## [28] "stats"                "graphics"             "grDevices"           
## [31] "utils"                "datasets"             "methods"             
## [34] "base"                
## 
## [[7]]
##  [1] "DESeq2"               "GEOquery"             "SummarizedExperiment"
##  [4] "DelayedArray"         "matrixStats"          "GenomicRanges"       
##  [7] "GenomeInfoDb"         "limma"                "forcats"             
## [10] "stringr"              "dplyr"                "purrr"               
## [13] "readr"                "tidyr"                "tibble"              
## [16] "ggplot2"              "tidyverse"            "oligo"               
## [19] "Biostrings"           "XVector"              "IRanges"             
## [22] "S4Vectors"            "stats4"               "Biobase"             
## [25] "oligoClasses"         "BiocGenerics"         "parallel"            
## [28] "knitr"                "stats"                "graphics"            
## [31] "grDevices"            "utils"                "datasets"            
## [34] "methods"              "base"                
## 
## [[8]]
##  [1] "org.Hs.eg.db"         "AnnotationDbi"        "DESeq2"              
##  [4] "GEOquery"             "SummarizedExperiment" "DelayedArray"        
##  [7] "matrixStats"          "GenomicRanges"        "GenomeInfoDb"        
## [10] "limma"                "forcats"              "stringr"             
## [13] "dplyr"                "purrr"                "readr"               
## [16] "tidyr"                "tibble"               "ggplot2"             
## [19] "tidyverse"            "oligo"                "Biostrings"          
## [22] "XVector"              "IRanges"              "S4Vectors"           
## [25] "stats4"               "Biobase"              "oligoClasses"        
## [28] "BiocGenerics"         "parallel"             "knitr"               
## [31] "stats"                "graphics"             "grDevices"           
## [34] "utils"                "datasets"             "methods"             
## [37] "base"                
## 
## [[9]]
##  [1] "org.Hs.eg.db"         "AnnotationDbi"        "DESeq2"              
##  [4] "GEOquery"             "SummarizedExperiment" "DelayedArray"        
##  [7] "matrixStats"          "GenomicRanges"        "GenomeInfoDb"        
## [10] "limma"                "forcats"              "stringr"             
## [13] "dplyr"                "purrr"                "readr"               
## [16] "tidyr"                "tibble"               "ggplot2"             
## [19] "tidyverse"            "oligo"                "Biostrings"          
## [22] "XVector"              "IRanges"              "S4Vectors"           
## [25] "stats4"               "Biobase"              "oligoClasses"        
## [28] "BiocGenerics"         "parallel"             "knitr"               
## [31] "stats"                "graphics"             "grDevices"           
## [34] "utils"                "datasets"             "methods"             
## [37] "base"                
## 
## [[10]]
##  [1] "recount"              "org.Hs.eg.db"         "AnnotationDbi"       
##  [4] "DESeq2"               "GEOquery"             "SummarizedExperiment"
##  [7] "DelayedArray"         "matrixStats"          "GenomicRanges"       
## [10] "GenomeInfoDb"         "limma"                "forcats"             
## [13] "stringr"              "dplyr"                "purrr"               
## [16] "readr"                "tidyr"                "tibble"              
## [19] "ggplot2"              "tidyverse"            "oligo"               
## [22] "Biostrings"           "XVector"              "IRanges"             
## [25] "S4Vectors"            "stats4"               "Biobase"             
## [28] "oligoClasses"         "BiocGenerics"         "parallel"            
## [31] "knitr"                "stats"                "graphics"            
## [34] "grDevices"            "utils"                "datasets"            
## [37] "methods"              "base"

Time for this code chunk: 0.683966159820557

Experiments

We selected 7 experiments that are in PulmonDB and are lung samples from COPD patients and that also have a control group to compare. These experiments are described in the following table:

gse_table <- read.csv(file.path(DATA_DIR,"GSE_table.csv"), row.names = 1)
kable(gse_table, caption = "GSE information")
GSE information
Samples Platform Platform.manufacturer Year Category
GSE1122 15 [Hu6800] Affymetrix 2004 Normal lung & Emphysema & AAD
GSE1650 30 [HG-U133A] Affymetrix 2004 Normal or mild Emphysema & severe Emphysema
GSE27597 72 [HuEx-1_0-st] Affymetrix 2011 Normal lung & COPD and Emphysema
GSE37768 38 [HG-U133_Plus_2] Affymetrix 2016 Normal lung & COPD
GSE47460 582 Agilent-014850 Agilent 2013 Normal lung & COPD & ILD
GSE57148 189 Illumina HiSeq 2000 Illumina 2015 Normal lung & COPD
GSE8581 58 [HG-U133_Plus_2] Affymetrix 2008 Normal lung & COPD

Time for this code chunk: 0.0177333354949951

The experiment GSE57148 is a RNA-seq experiment, and we do not need to normalize the data using RMA but we will download counts from ReCount2.

Local functions

rawCEL_normCEL: Pre-process raw data

We first read and pre-process raw .CEL files. Each experiment has it own folder with raw data per sample. Then we normalized using RMA algorithm and finally, we save it in a .CSV file.

In this function you need:

Input: GSE ID
Output: Samples normalized, boxplots and histograms from raw and normalized data

rawCEL_normCEL <- function(gse){
  # select CEL files
  celfiles <- list.celfiles(file.path(DATA_DIR,"celfiles",gse), full.names=TRUE,listGzipped=TRUE)
  # read CEL files in R
  rawData <- read.celfiles(celfiles)
  #### Figures of raw data
  #pdf(str_c("raw_",gse,"_boxplot",TODAY,".pdf"))
  ## boxplot of raw data
  boxplot(rawData,target="core")
  ## hist of raw data
  hist(rawData,target="core")
  #dev.off()
  ## RMA normalization
  normData <- rma(rawData)
  #### Figures of Normalized data
  #pdf(str_c("norm_",gse,"_boxplot",TODAY,".pdf"))
  ## boxplot of norm data
  boxplot(normData)
  ## hist of norm data
  hist(normData)
  # dev.off()
  #write.csv(exprs(normData),str_c(OUTPUT_DIR,"/",gse,"_normData",TODAY,".txt"),quote=F)
  return(normData)
}

#sapply(tissue,rawCEL_normCEL)

Time for this code chunk: 0.00420188903808594

get_GEO: get annotation using GEOquery package

This function download the ExpressionSet object from GEOquery that has the sample annotations, then we replace the expression values with our calculated pre-processed data.

Input: GSE ID, norm object with pre-processed values
Output: ExpressionSet object with GEOquery annotation and pre-processed values

get_GEO <- function(gse,norm,i=1){
  qx <- getGEO(gse)
  message("Data downloaded from GEOquery:")
  print(qx)
  if (length(qx) == 1) {
    qx <- qx[[1]]
  } else{
    qx <- qx[i][[1]]
  }
  message("Colnames of GEOquery object:")
  print(colnames(qx)[1:5])
  message("Colnames of calculated pre-processed data:")
  print(colnames(norm)[1:5])
  # Rename sample columns (Change GSM18403.CEL.gz to GSM18403)
  # sort(colnames(norm1))
  colnames(norm) = colnames(qx)
  exprs(norm)[1:3,1:3]
  exprs(qx) <- exprs(norm)
  return(qx)
}

Time for this code chunk: 0.00354647636413574

DE: Differential expression analysis

Using this funtion, we get a table with differential expression gene results using limma package for fitting a linear model to get genes differentially expressed between a “Control” and a “COPD” group.

Input: GSE ID, optional: colCOPD is the column name in which the information of disease status can be found, coeff will show results of contrast with coeffitient found in possiton 2
Output: Table of differential expression results with all genes

DE <- function(ExpressionSet,colCOPD="Disease",coeff= 2){
     # it creates the design matrix and performs limma
     fit <- lmFit(ExpressionSet, model.matrix(as.formula(paste("~ 1 +", colCOPD)),
                                              data = pData(ExpressionSet)))
     # eBayes in lmFit model
     ebf <- eBayes(fit)
     print(colnames(coef(fit)))
     # It gets the genes with the p-values
     volcanoplot(ebf,coef = coeff,highlight=20, pch=20)
     res <- topTable(ebf, number = Inf, p.value = 1, coef = coeff,confint=T)
     # It formats in a tibble
     res <- as_tibble(res,rownames="rownames")
}

Time for this code chunk: 0.00349020957946777

GSE1122

This experiment evaluates gene expression profiles of emphysema using “usual” emphysema and Alpha-1 Antitrypsin Deficiency-related emphysema (AAD). As a control group, authors used normal lung tissue from “organs donated for transplant, but unused due to age or size mismatch”, non of those individuals were smokers or reported clinical airflow limitation.

This study showed that inflammation, immune responses, and proteolysis are emphysema characteristics. They also found similarities and differences between AAD and “usual” emphysema.

The authors measured 15 lung samples, 5 controls, 5 “usual” emphysema and 5 AAD emphysema. The raw data is in cel files, the platform used is [Hu6800] Affymetrix. The study was perfomed in Colorado, USA.

Pre-process raw data

We pre-processed raw data using the function rawCEL_normCEL, plots will be shown as additional output.

gse1<- rownames(gse_table)[1] 
norm1 <- rawCEL_normCEL(gse1)
## Loading required package: pd.hu6800
## Loading required package: RSQLite
## Loading required package: DBI
## Platform design info loaded.
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18403.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18404.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18405.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18406.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18407.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18408.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18409.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18410.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18411.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18412.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18413.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18414.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18415.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18416.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1122/GSM18417.CEL.gz

## Background correcting
## Normalizing
## Calculating Expression

norm1
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 7129 features, 15 samples 
##   element names: exprs 
## protocolData
##   rowNames: GSM18403.CEL.gz GSM18404.CEL.gz ... GSM18417.CEL.gz (15
##     total)
##   varLabels: exprs dates
##   varMetadata: labelDescription channel
## phenoData
##   rowNames: GSM18403.CEL.gz GSM18404.CEL.gz ... GSM18417.CEL.gz (15
##     total)
##   varLabels: index
##   varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.hu6800

Time for this code chunk: 11.0009229183197

Get annotation

We used GEOquery package to obtain sample annotations and our previous calculated pre-processed values to create an ExpressionSet object.

# get annotation using GEOquery package
geo1 <- get_GEO(gse1,norm1)
## Found 1 file(s)
## GSE1122_series_matrix.txt.gz
## Parsed with column specification:
## cols(
##   ID_REF = col_character(),
##   GSM18403 = col_double(),
##   GSM18404 = col_double(),
##   GSM18405 = col_double(),
##   GSM18406 = col_double(),
##   GSM18407 = col_double(),
##   GSM18408 = col_double(),
##   GSM18409 = col_double(),
##   GSM18410 = col_double(),
##   GSM18411 = col_double(),
##   GSM18412 = col_double(),
##   GSM18413 = col_double(),
##   GSM18414 = col_double(),
##   GSM18415 = col_double(),
##   GSM18416 = col_double(),
##   GSM18417 = col_double()
## )
## File stored at:
## /tmp/Rtmp4ZNwux/GPL80.soft
## Data downloaded from GEOquery:
## $GSE1122_series_matrix.txt.gz
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 7129 features, 15 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM18403 GSM18404 ... GSM18417 (15 total)
##   varLabels: title geo_accession ... data_row_count (26 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: A28102_at AB000114_at ... Z97074_at (7129 total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 15284076 
## Annotation: GPL80
## Colnames of GEOquery object:
## [1] "GSM18403" "GSM18404" "GSM18405" "GSM18406" "GSM18407"
## Colnames of calculated pre-processed data:
## [1] "GSM18403.CEL.gz" "GSM18404.CEL.gz" "GSM18405.CEL.gz" "GSM18406.CEL.gz"
## [5] "GSM18407.CEL.gz"

Time for this code chunk: 4.41413021087646

Select column with COPD description

Each experiment has its own annotation and we needed to look for a column describing which sample is a “Control” and which one is “COPD”.

head(pData(geo1))
##           title geo_accession                status submission_date
## GSM18403 01_NML      GSM18403 Public on Jun 01 2004     Mar 09 2004
## GSM18404 02_NML      GSM18404 Public on Jun 01 2004     Mar 09 2004
## GSM18405 03_NML      GSM18405 Public on Jun 01 2004     Mar 09 2004
## GSM18406 04_NML      GSM18406 Public on Jun 01 2004     Mar 09 2004
## GSM18407 05_NML      GSM18407 Public on Jun 01 2004     Mar 09 2004
## GSM18408 02_ADL      GSM18408 Public on Jun 01 2004     Mar 09 2004
##          last_update_date type channel_count source_name_ch1 organism_ch1
## GSM18403      Nov 29 2006  RNA             1     lung tissue Homo sapiens
## GSM18404      Nov 29 2006  RNA             1     lung tissue Homo sapiens
## GSM18405      Nov 29 2006  RNA             1     lung tissue Homo sapiens
## GSM18406      Nov 29 2006  RNA             1     lung tissue Homo sapiens
## GSM18407      Nov 29 2006  RNA             1     lung tissue Homo sapiens
## GSM18408      Nov 29 2006  RNA             1     lung tissue Homo sapiens
##          molecule_ch1 taxid_ch1
## GSM18403    total RNA      9606
## GSM18404    total RNA      9606
## GSM18405    total RNA      9606
## GSM18406    total RNA      9606
## GSM18407    total RNA      9606
## GSM18408    total RNA      9606
##                                                     description platform_id
## GSM18403                                            Normal lung       GPL80
## GSM18404                                            Normal lung       GPL80
## GSM18405                                            Normal lung       GPL80
## GSM18406                                            Normal lung       GPL80
## GSM18407                                            Normal lung       GPL80
## GSM18408 Alpha-1 Antitrypsin Deficiency-related emphysemic lung       GPL80
##                   contact_name              contact_email contact_phone
## GSM18403 Christopher,D,Coldren Chris.Coldren@ucdenver.edu  303 724 6056
## GSM18404 Christopher,D,Coldren Chris.Coldren@ucdenver.edu  303 724 6056
## GSM18405 Christopher,D,Coldren Chris.Coldren@ucdenver.edu  303 724 6056
## GSM18406 Christopher,D,Coldren Chris.Coldren@ucdenver.edu  303 724 6056
## GSM18407 Christopher,D,Coldren Chris.Coldren@ucdenver.edu  303 724 6056
## GSM18408 Christopher,D,Coldren Chris.Coldren@ucdenver.edu  303 724 6056
##                                     contact_laboratory contact_department
## GSM18403 Pulmonary Sciences and Critical Care Medicine           Medicine
## GSM18404 Pulmonary Sciences and Critical Care Medicine           Medicine
## GSM18405 Pulmonary Sciences and Critical Care Medicine           Medicine
## GSM18406 Pulmonary Sciences and Critical Care Medicine           Medicine
## GSM18407 Pulmonary Sciences and Critical Care Medicine           Medicine
## GSM18408 Pulmonary Sciences and Critical Care Medicine           Medicine
##                                  contact_institute       contact_address
## GSM18403 University of Colorado School of Medicine 12700 East 17th Place
## GSM18404 University of Colorado School of Medicine 12700 East 17th Place
## GSM18405 University of Colorado School of Medicine 12700 East 17th Place
## GSM18406 University of Colorado School of Medicine 12700 East 17th Place
## GSM18407 University of Colorado School of Medicine 12700 East 17th Place
## GSM18408 University of Colorado School of Medicine 12700 East 17th Place
##          contact_city contact_state contact_zip/postal_code contact_country
## GSM18403       Aurora            CO                   80045             USA
## GSM18404       Aurora            CO                   80045             USA
## GSM18405       Aurora            CO                   80045             USA
## GSM18406       Aurora            CO                   80045             USA
## GSM18407       Aurora            CO                   80045             USA
## GSM18408       Aurora            CO                   80045             USA
##                                                                      supplementary_file
## GSM18403 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM18nnn/GSM18403/suppl/GSM18403.CEL.gz
## GSM18404 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM18nnn/GSM18404/suppl/GSM18404.CEL.gz
## GSM18405 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM18nnn/GSM18405/suppl/GSM18405.CEL.gz
## GSM18406 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM18nnn/GSM18406/suppl/GSM18406.CEL.gz
## GSM18407 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM18nnn/GSM18407/suppl/GSM18407.CEL.gz
## GSM18408 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM18nnn/GSM18408/suppl/GSM18408.CEL.gz
##          data_row_count
## GSM18403           7129
## GSM18404           7129
## GSM18405           7129
## GSM18406           7129
## GSM18407           7129
## GSM18408           7129

Time for this code chunk: 0.0097203254699707

Names will be different but it is important to check that “Control” group is the first level. If need it re-level groups.

pData(geo1)["Disease"] <- factor(pData(geo1)[,"description"],levels = c("Normal lung","'usual' emphysemic lung","Alpha-1 Antitrypsin Deficiency-related emphysemic lung"))

table(pData(geo1)$Disease)
## 
##                                            Normal lung 
##                                                      5 
##                                'usual' emphysemic lung 
##                                                      5 
## Alpha-1 Antitrypsin Deficiency-related emphysemic lung 
##                                                      5

Time for this code chunk: 0.0104248523712158

Differential expression analysis

Using DE() function (described above), we performed a lineal regression model to calculate the logarithm fold change of all genes between a “Control” and a “COPD” group. We also rename colnames adding the GSE ID at the end and finally, we save the output in a .CSV file.

de1 <- DE(geo1)
## [1] "(Intercept)"                                                  
## [2] "Disease'usual' emphysemic lung"                               
## [3] "DiseaseAlpha-1 Antitrypsin Deficiency-related emphysemic lung"

colnames(de1) <- str_c(colnames(de1),"_",gse1)
colnames(de1)
##  [1] "rownames_GSE1122"                        
##  [2] "ID_GSE1122"                              
##  [3] "GB_ACC_GSE1122"                          
##  [4] "SPOT_ID_GSE1122"                         
##  [5] "Species.Scientific.Name_GSE1122"         
##  [6] "Annotation.Date_GSE1122"                 
##  [7] "Sequence.Type_GSE1122"                   
##  [8] "Sequence.Source_GSE1122"                 
##  [9] "Target.Description_GSE1122"              
## [10] "Representative.Public.ID_GSE1122"        
## [11] "Gene.Title_GSE1122"                      
## [12] "Gene.Symbol_GSE1122"                     
## [13] "ENTREZ_GENE_ID_GSE1122"                  
## [14] "RefSeq.Transcript.ID_GSE1122"            
## [15] "Gene.Ontology.Biological.Process_GSE1122"
## [16] "Gene.Ontology.Cellular.Component_GSE1122"
## [17] "Gene.Ontology.Molecular.Function_GSE1122"
## [18] "logFC_GSE1122"                           
## [19] "CI.L_GSE1122"                            
## [20] "CI.R_GSE1122"                            
## [21] "AveExpr_GSE1122"                         
## [22] "t_GSE1122"                               
## [23] "P.Value_GSE1122"                         
## [24] "adj.P.Val_GSE1122"                       
## [25] "B_GSE1122"
write_csv(de1,
          path=str_c(OUTPUT_DIR,"/TableGenes_",gse1,"_",TODAY,".csv")
          )

Time for this code chunk: 0.682673931121826

GSE1650

Emphysema gene expression was measured using sever, mildly and non emphysema lung tissue. The authors separate samples in two groups, 18 sever and 12 mildly/non emphysema because the limited sample size (7 mildly, 5 non emphysema). From control group, 9 tissues were obtained from smokers with nodules suspicious for lung cancer.

The results of this study shows oxidative stress, extracellular matrix synthesis, and inflammation pathways overexpressed in severe emphysema, whereas endothelium-related was decreased.

The authors didn’t annotate samples individually, the meta information is not available. We assumed that N refers to controls and L to sever emphysema, unfortunately, we can’t separate mildly and non emphysema because of the lack of infomation.

Raw data is in cel files, the platform used is [HG-U133A] Affymetrix. The study was perfomed in Boston, USA (Boston University Medical Center).

Pre-process raw data

We pre-processed raw data using the function rawCEL_normCEL, plots will be shown as additional output.

gse2<- rownames(gse_table)[2] 
norm2 <- rawCEL_normCEL(gse2)
## Loading required package: pd.hg.u133a
## Platform design info loaded.
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28357.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28358.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28359.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28360.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28361.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28362.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28363.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28364.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28365.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28366.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28367.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28368.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28369.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28370.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28371.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28372.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28373.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28374.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28375.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28376.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28377.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28378.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28379.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28380.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28381.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28382.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28383.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28384.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28385.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE1650/GSM28386.CEL.gz

## Background correcting
## Normalizing
## Calculating Expression

norm2
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 22283 features, 30 samples 
##   element names: exprs 
## protocolData
##   rowNames: GSM28357.CEL.gz GSM28358.CEL.gz ... GSM28386.CEL.gz (30
##     total)
##   varLabels: exprs dates
##   varMetadata: labelDescription channel
## phenoData
##   rowNames: GSM28357.CEL.gz GSM28358.CEL.gz ... GSM28386.CEL.gz (30
##     total)
##   varLabels: index
##   varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.hg.u133a

Time for this code chunk: 22.4711818695068

Get annotation

We used GEOquery package to obtain sample annotations and our previous calculated pre-processed values to create an ExpressionSet object.

# get annotation using GEOquery package
geo2 <- get_GEO(gse2,norm2)
## Found 1 file(s)
## GSE1650_series_matrix.txt.gz
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   ID_REF = col_character()
## )
## See spec(...) for full column specifications.
## File stored at:
## /tmp/Rtmp4ZNwux/GPL96.soft
## Data downloaded from GEOquery:
## $GSE1650_series_matrix.txt.gz
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 22283 features, 30 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM28357 GSM28358 ... GSM28386 (30 total)
##   varLabels: title geo_accession ... relation (30 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22283 total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 15374838 
## Annotation: GPL96
## Colnames of GEOquery object:
## [1] "GSM28357" "GSM28358" "GSM28359" "GSM28360" "GSM28361"
## Colnames of calculated pre-processed data:
## [1] "GSM28357.CEL.gz" "GSM28358.CEL.gz" "GSM28359.CEL.gz" "GSM28360.CEL.gz"
## [5] "GSM28361.CEL.gz"

Time for this code chunk: 9.52107810974121

Select column with COPD description

Each experiment has its own annotation and we need to look for a column describing which sample is a “Control” and which one is “COPD”.

head(pData(geo2))
##          title geo_accession                status submission_date
## GSM28357   10L      GSM28357 Public on Aug 08 2004     Aug 06 2004
## GSM28358   10N      GSM28358 Public on Aug 08 2004     Aug 06 2004
## GSM28359   11L      GSM28359 Public on Aug 08 2004     Aug 06 2004
## GSM28360   11N      GSM28360 Public on Aug 08 2004     Aug 06 2004
## GSM28361   12L      GSM28361 Public on Aug 08 2004     Aug 06 2004
## GSM28362   12N      GSM28362 Public on Aug 08 2004     Aug 06 2004
##          last_update_date type channel_count source_name_ch1 organism_ch1
## GSM28357      Aug 18 2014  RNA             1     Lung Tissue Homo sapiens
## GSM28358      Aug 18 2014  RNA             1     Lung Tissue Homo sapiens
## GSM28359      Aug 18 2014  RNA             1     Lung Tissue Homo sapiens
## GSM28360      Aug 18 2014  RNA             1     Lung Tissue Homo sapiens
## GSM28361      Aug 18 2014  RNA             1     Lung Tissue Homo sapiens
## GSM28362      Aug 18 2014  RNA             1     Lung Tissue Homo sapiens
##          molecule_ch1 taxid_ch1
## GSM28357    total RNA      9606
## GSM28358    total RNA      9606
## GSM28359    total RNA      9606
## GSM28360    total RNA      9606
## GSM28361    total RNA      9606
## GSM28362    total RNA      9606
##                                                description
## GSM28357 Lung tissue and ressected lung taken from smokers
## GSM28358 Lung tissue and ressected lung taken from smokers
## GSM28359 Lung tissue and ressected lung taken from smokers
## GSM28360 Lung tissue and ressected lung taken from smokers
## GSM28361 Lung tissue and ressected lung taken from smokers
## GSM28362 Lung tissue and ressected lung taken from smokers
##                                                                                                    description.1
## GSM28357 Keywords = Smoking, COPD, lung reduction, airway, molecular screen for spatially restricted transcripts
## GSM28358 Keywords = Smoking, COPD, lung reduction, airway, molecular screen for spatially restricted transcripts
## GSM28359 Keywords = Smoking, COPD, lung reduction, airway, molecular screen for spatially restricted transcripts
## GSM28360 Keywords = Smoking, COPD, lung reduction, airway, molecular screen for spatially restricted transcripts
## GSM28361 Keywords = Smoking, COPD, lung reduction, airway, molecular screen for spatially restricted transcripts
## GSM28362 Keywords = Smoking, COPD, lung reduction, airway, molecular screen for spatially restricted transcripts
##          platform_id contact_name contact_email contact_phone  contact_fax
## GSM28357       GPL96 Avrum,,Spira aspira@bu.edu  617-638-4860 617-536-8093
## GSM28358       GPL96 Avrum,,Spira aspira@bu.edu  617-638-4860 617-536-8093
## GSM28359       GPL96 Avrum,,Spira aspira@bu.edu  617-638-4860 617-536-8093
## GSM28360       GPL96 Avrum,,Spira aspira@bu.edu  617-638-4860 617-536-8093
## GSM28361       GPL96 Avrum,,Spira aspira@bu.edu  617-638-4860 617-536-8093
## GSM28362       GPL96 Avrum,,Spira aspira@bu.edu  617-638-4860 617-536-8093
##          contact_laboratory                   contact_department
## GSM28357    Pulmonomics Lab Pulmonary and Critical Care Medicine
## GSM28358    Pulmonomics Lab Pulmonary and Critical Care Medicine
## GSM28359    Pulmonomics Lab Pulmonary and Critical Care Medicine
## GSM28360    Pulmonomics Lab Pulmonary and Critical Care Medicine
## GSM28361    Pulmonomics Lab Pulmonary and Critical Care Medicine
## GSM28362    Pulmonomics Lab Pulmonary and Critical Care Medicine
##                         contact_institute         contact_address contact_city
## GSM28357 Boston University Medical Center 715 Albany Street, R304       Boston
## GSM28358 Boston University Medical Center 715 Albany Street, R304       Boston
## GSM28359 Boston University Medical Center 715 Albany Street, R304       Boston
## GSM28360 Boston University Medical Center 715 Albany Street, R304       Boston
## GSM28361 Boston University Medical Center 715 Albany Street, R304       Boston
## GSM28362 Boston University Medical Center 715 Albany Street, R304       Boston
##          contact_state contact_zip/postal_code contact_country
## GSM28357            MA                   02118             USA
## GSM28358            MA                   02118             USA
## GSM28359            MA                   02118             USA
## GSM28360            MA                   02118             USA
## GSM28361            MA                   02118             USA
## GSM28362            MA                   02118             USA
##                    contact_web_link
## GSM28357 http://www.pulmonomics.com
## GSM28358 http://www.pulmonomics.com
## GSM28359 http://www.pulmonomics.com
## GSM28360 http://www.pulmonomics.com
## GSM28361 http://www.pulmonomics.com
## GSM28362 http://www.pulmonomics.com
##                                                                      supplementary_file
## GSM28357 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM28nnn/GSM28357/suppl/GSM28357.CEL.gz
## GSM28358 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM28nnn/GSM28358/suppl/GSM28358.CEL.gz
## GSM28359 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM28nnn/GSM28359/suppl/GSM28359.CEL.gz
## GSM28360 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM28nnn/GSM28360/suppl/GSM28360.CEL.gz
## GSM28361 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM28nnn/GSM28361/suppl/GSM28361.CEL.gz
## GSM28362 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM28nnn/GSM28362/suppl/GSM28362.CEL.gz
##          data_row_count                relation
## GSM28357          22283 Reanalyzed by: GSE60486
## GSM28358          22283 Reanalyzed by: GSE60486
## GSM28359          22283 Reanalyzed by: GSE60486
## GSM28360          22283 Reanalyzed by: GSE60486
## GSM28361          22283 Reanalyzed by: GSE60486
## GSM28362          22283 Reanalyzed by: GSE60486

Time for this code chunk: 0.0101165771484375

Names will be different but it is important to check that “Control” group is the first level. If need it re-level groups.

pData(geo2)["Disease"] <- str_count(as.character(pData(geo2)[,"title"]),"L")
table(pData(geo2)$Disease)
## 
##  0  1 
## 12 18

Time for this code chunk: 0.0110573768615723

Differential expression analysis

Using DE() function (described above), we performed a lineal regression model to calculate the logarithm fold change of all genes between a “Control” and a “COPD” group. We also rename colnames adding the GSE ID at the end and finally, we save the output in a .CSV file.

de2 <- DE(geo2)
## [1] "(Intercept)" "Disease"

colnames(de2) <- str_c(colnames(de2),"_",gse2)
colnames(de2)
##  [1] "rownames_GSE1650"                        
##  [2] "ID_GSE1650"                              
##  [3] "GB_ACC_GSE1650"                          
##  [4] "SPOT_ID_GSE1650"                         
##  [5] "Species.Scientific.Name_GSE1650"         
##  [6] "Annotation.Date_GSE1650"                 
##  [7] "Sequence.Type_GSE1650"                   
##  [8] "Sequence.Source_GSE1650"                 
##  [9] "Target.Description_GSE1650"              
## [10] "Representative.Public.ID_GSE1650"        
## [11] "Gene.Title_GSE1650"                      
## [12] "Gene.Symbol_GSE1650"                     
## [13] "ENTREZ_GENE_ID_GSE1650"                  
## [14] "RefSeq.Transcript.ID_GSE1650"            
## [15] "Gene.Ontology.Biological.Process_GSE1650"
## [16] "Gene.Ontology.Cellular.Component_GSE1650"
## [17] "Gene.Ontology.Molecular.Function_GSE1650"
## [18] "logFC_GSE1650"                           
## [19] "CI.L_GSE1650"                            
## [20] "CI.R_GSE1650"                            
## [21] "AveExpr_GSE1650"                         
## [22] "t_GSE1650"                               
## [23] "P.Value_GSE1650"                         
## [24] "adj.P.Val_GSE1650"                       
## [25] "B_GSE1650"
write_csv(de2,
          path=str_c(OUTPUT_DIR,"/TableGenes_",gse2,"_",TODAY,".csv")
          )

Time for this code chunk: 1.76264119148254

GSE27597

This experiment measured 8 lung tissue samples from 8 different regions, in total they had 64 gene expression samples, 6 sever COPD patients and 2 donors. The control samples were lung donation organs without suitable recipient, one was a smoker and the other one never smoked.

Results showed inflammation over expressed and tissue repair under expressed in emphysema.

In this GSE id, we can find two experiments that uses different platforms, one measured COPD patients and controls using [HuEx-1_0-st] Affymetrix. And the other one measured fibroblast cell line with [HuGene10stv1_Hs_ENSG] Affymetrix. Authors used Human lung fibroblast cultures (HFL-1) with two concentrations of GHK or with TGFβ1.

Pre-process raw data

We pre-processed raw data using the function rawCEL_normCEL, plots will be shown as additional output.

gse3<- rownames(gse_table)[3] 
norm3 <- rawCEL_normCEL(gse3)
## Loading required package: pd.huex.1.0.st.v2
## Platform design info loaded.
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684089.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684090.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684091.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684092.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684093.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684094.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684095.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684096.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684097.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684098.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684101.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684103.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684105.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684107.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684109.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684112.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684114.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684117.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684119.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684120.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684121.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684122.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684123.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684124.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684125.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684126.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684127.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684128.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684129.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684130.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684132.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684133.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684135.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684136.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684139.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684141.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684143.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684144.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684145.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684146.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684147.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684148.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684149.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684150.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684151.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684152.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684153.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684154.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684155.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684156.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684157.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684158.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684159.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684160.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684161.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684162.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684163.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684164.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684165.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684166.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684167.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684168.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684169.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE27597/GSM684170.CEL.gz

## Background correcting
## Normalizing
## Calculating Expression

norm3
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 22011 features, 64 samples 
##   element names: exprs 
## protocolData
##   rowNames: GSM684089.CEL.gz GSM684090.CEL.gz ... GSM684170.CEL.gz (64
##     total)
##   varLabels: exprs dates
##   varMetadata: labelDescription channel
## phenoData
##   rowNames: GSM684089.CEL.gz GSM684090.CEL.gz ... GSM684170.CEL.gz (64
##     total)
##   varLabels: index
##   varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.huex.1.0.st.v2

Time for this code chunk: 2.09101250569026

Get annotation

We used GEOquery package to obtain sample annotations and our previous calculated pre-processed values to create an ExpressionSet object.

# get annotation using GEOquery package
geo3 <- get_GEO(gse3,i=2,norm3)
## Found 2 file(s)
## GSE27597-GPL13243_series_matrix.txt.gz
## Parsed with column specification:
## cols(
##   ID_REF = col_character(),
##   GSM684494 = col_double(),
##   GSM684495 = col_double(),
##   GSM684496 = col_double(),
##   GSM684497 = col_double(),
##   GSM684498 = col_double(),
##   GSM684499 = col_double(),
##   GSM684500 = col_double(),
##   GSM684501 = col_double()
## )
## File stored at:
## /tmp/Rtmp4ZNwux/GPL13243.soft
## GSE27597-GPL5175_series_matrix.txt.gz
## Parsed with column specification:
## cols(
##   .default = col_double()
## )
## See spec(...) for full column specifications.
## File stored at:
## /tmp/Rtmp4ZNwux/GPL5175.soft
## Data downloaded from GEOquery:
## $`GSE27597-GPL13243_series_matrix.txt.gz`
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 19793 features, 8 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM684494 GSM684495 ... GSM684501 (8 total)
##   varLabels: title geo_accession ... time:ch1 (36 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 10000_at 10001_at ... 9_at (19793 total)
##   fvarLabels: ID SPOT_ID Description
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 22937864
## 24380442 
## Annotation: GPL13243 
## 
## $`GSE27597-GPL5175_series_matrix.txt.gz`
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 22011 features, 64 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM684089 GSM684090 ... GSM684170 (64 total)
##   varLabels: title geo_accession ... slice:ch1 (46 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 2315554 2315633 ... 7385696 (22011 total)
##   fvarLabels: ID GB_LIST ... category (12 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 22937864
## 24380442 
## Annotation: GPL5175
## Colnames of GEOquery object:
## [1] "GSM684089" "GSM684090" "GSM684091" "GSM684092" "GSM684093"
## Colnames of calculated pre-processed data:
## [1] "GSM684089.CEL.gz" "GSM684090.CEL.gz" "GSM684091.CEL.gz" "GSM684092.CEL.gz"
## [5] "GSM684093.CEL.gz"

Time for this code chunk: 18.4546620845795

Select column with COPD description

Each experiment has its own annotation and we needed to look for a column describing which sample is a “Control” and which one is “COPD”.

head(pData(geo3))
##                  title geo_accession                status submission_date
## GSM684089 6965-03-COPD     GSM684089 Public on Sep 11 2012     Mar 01 2011
## GSM684090 6965-04-COPD     GSM684090 Public on Sep 11 2012     Mar 01 2011
## GSM684091 6965-05-COPD     GSM684091 Public on Sep 11 2012     Mar 01 2011
## GSM684092 6965-06-COPD     GSM684092 Public on Sep 11 2012     Mar 01 2011
## GSM684093 6965-07-COPD     GSM684093 Public on Sep 11 2012     Mar 01 2011
## GSM684094 6965-08-COPD     GSM684094 Public on Sep 11 2012     Mar 01 2011
##           last_update_date type channel_count   source_name_ch1 organism_ch1
## GSM684089      Sep 11 2012  RNA             1 Whole lung tissue Homo sapiens
## GSM684090      Sep 11 2012  RNA             1 Whole lung tissue Homo sapiens
## GSM684091      Sep 11 2012  RNA             1 Whole lung tissue Homo sapiens
## GSM684092      Sep 11 2012  RNA             1 Whole lung tissue Homo sapiens
## GSM684093      Sep 11 2012  RNA             1 Whole lung tissue Homo sapiens
## GSM684094      Sep 11 2012  RNA             1 Whole lung tissue Homo sapiens
##           characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
## GSM684089     lm: 870.7318499         patient: 6965              slice: 3
## GSM684090     lm: 639.6153396         patient: 6965              slice: 4
## GSM684091     lm: 982.9172824         patient: 6965              slice: 5
## GSM684092      lm: 801.976686         patient: 6965              slice: 6
## GSM684093     lm: 726.5157591         patient: 6965              slice: 7
## GSM684094      lm: 663.336977         patient: 6965              slice: 8
##           characteristics_ch1.3 characteristics_ch1.4 characteristics_ch1.5
## GSM684089        copd status: 1             Sex: Male               age: 62
## GSM684090        copd status: 1             Sex: Male               age: 62
## GSM684091        copd status: 1             Sex: Male               age: 62
## GSM684092        copd status: 1             Sex: Male               age: 62
## GSM684093        copd status: 1             Sex: Male               age: 62
## GSM684094        copd status: 1             Sex: Male               age: 62
##           characteristics_ch1.6 characteristics_ch1.7 molecule_ch1
## GSM684089        pack years: 50           notes: none    total RNA
## GSM684090        pack years: 50           notes: none    total RNA
## GSM684091        pack years: 50           notes: none    total RNA
## GSM684092        pack years: 50           notes: none    total RNA
## GSM684093        pack years: 50           notes: none    total RNA
## GSM684094        pack years: 50           notes: none    total RNA
##                                                                                                                                                                                                                                                                                                             extract_protocol_ch1
## GSM684089 High molecular weight RNA was isolated from tissue cores using the miRNeasy Mini Kit (Qiagen). The RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA purity was assessed using a NanoDrop spectrophotometer.  One ug of RNA was processed and used as starting material for the microarray studies.
## GSM684090 High molecular weight RNA was isolated from tissue cores using the miRNeasy Mini Kit (Qiagen). The RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA purity was assessed using a NanoDrop spectrophotometer.  One ug of RNA was processed and used as starting material for the microarray studies.
## GSM684091 High molecular weight RNA was isolated from tissue cores using the miRNeasy Mini Kit (Qiagen). The RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA purity was assessed using a NanoDrop spectrophotometer.  One ug of RNA was processed and used as starting material for the microarray studies.
## GSM684092 High molecular weight RNA was isolated from tissue cores using the miRNeasy Mini Kit (Qiagen). The RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA purity was assessed using a NanoDrop spectrophotometer.  One ug of RNA was processed and used as starting material for the microarray studies.
## GSM684093 High molecular weight RNA was isolated from tissue cores using the miRNeasy Mini Kit (Qiagen). The RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA purity was assessed using a NanoDrop spectrophotometer.  One ug of RNA was processed and used as starting material for the microarray studies.
## GSM684094 High molecular weight RNA was isolated from tissue cores using the miRNeasy Mini Kit (Qiagen). The RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA purity was assessed using a NanoDrop spectrophotometer.  One ug of RNA was processed and used as starting material for the microarray studies.
##           label_ch1
## GSM684089    biotin
## GSM684090    biotin
## GSM684091    biotin
## GSM684092    biotin
## GSM684093    biotin
## GSM684094    biotin
##                                                                                                                                                                                                           label_protocol_ch1
## GSM684089 Ribosomal RNA was first removed using the RiboMinus Human/Mouse Transcriptome Isolation Kit (Invitrogen, Carlsbad, CA). This treated RNA was then converted to cDNA and subsequently processed and biotin-labeled.
## GSM684090 Ribosomal RNA was first removed using the RiboMinus Human/Mouse Transcriptome Isolation Kit (Invitrogen, Carlsbad, CA). This treated RNA was then converted to cDNA and subsequently processed and biotin-labeled.
## GSM684091 Ribosomal RNA was first removed using the RiboMinus Human/Mouse Transcriptome Isolation Kit (Invitrogen, Carlsbad, CA). This treated RNA was then converted to cDNA and subsequently processed and biotin-labeled.
## GSM684092 Ribosomal RNA was first removed using the RiboMinus Human/Mouse Transcriptome Isolation Kit (Invitrogen, Carlsbad, CA). This treated RNA was then converted to cDNA and subsequently processed and biotin-labeled.
## GSM684093 Ribosomal RNA was first removed using the RiboMinus Human/Mouse Transcriptome Isolation Kit (Invitrogen, Carlsbad, CA). This treated RNA was then converted to cDNA and subsequently processed and biotin-labeled.
## GSM684094 Ribosomal RNA was first removed using the RiboMinus Human/Mouse Transcriptome Isolation Kit (Invitrogen, Carlsbad, CA). This treated RNA was then converted to cDNA and subsequently processed and biotin-labeled.
##           taxid_ch1
## GSM684089      9606
## GSM684090      9606
## GSM684091      9606
## GSM684092      9606
## GSM684093      9606
## GSM684094      9606
##                                                                                                                                                                                                                                                                                                                                                                             hyb_protocol
## GSM684089 cDNA was end-labeled with a biotinylated dideoxynucleotide using terminal transferase. Five and a half micrograms of biotinylated cDNA was added to a hybridization cocktail, loaded on a Human Exon 1.0 ST GeneChip and hybridized for 16 hours at 45 ºC and 60 rpm. Following hybridization, the array was washed and stained according to the standard Affymetrix protocol.
## GSM684090 cDNA was end-labeled with a biotinylated dideoxynucleotide using terminal transferase. Five and a half micrograms of biotinylated cDNA was added to a hybridization cocktail, loaded on a Human Exon 1.0 ST GeneChip and hybridized for 16 hours at 45 ºC and 60 rpm. Following hybridization, the array was washed and stained according to the standard Affymetrix protocol.
## GSM684091 cDNA was end-labeled with a biotinylated dideoxynucleotide using terminal transferase. Five and a half micrograms of biotinylated cDNA was added to a hybridization cocktail, loaded on a Human Exon 1.0 ST GeneChip and hybridized for 16 hours at 45 ºC and 60 rpm. Following hybridization, the array was washed and stained according to the standard Affymetrix protocol.
## GSM684092 cDNA was end-labeled with a biotinylated dideoxynucleotide using terminal transferase. Five and a half micrograms of biotinylated cDNA was added to a hybridization cocktail, loaded on a Human Exon 1.0 ST GeneChip and hybridized for 16 hours at 45 ºC and 60 rpm. Following hybridization, the array was washed and stained according to the standard Affymetrix protocol.
## GSM684093 cDNA was end-labeled with a biotinylated dideoxynucleotide using terminal transferase. Five and a half micrograms of biotinylated cDNA was added to a hybridization cocktail, loaded on a Human Exon 1.0 ST GeneChip and hybridized for 16 hours at 45 ºC and 60 rpm. Following hybridization, the array was washed and stained according to the standard Affymetrix protocol.
## GSM684094 cDNA was end-labeled with a biotinylated dideoxynucleotide using terminal transferase. Five and a half micrograms of biotinylated cDNA was added to a hybridization cocktail, loaded on a Human Exon 1.0 ST GeneChip and hybridized for 16 hours at 45 ºC and 60 rpm. Following hybridization, the array was washed and stained according to the standard Affymetrix protocol.
##                                                                                                                        scan_protocol
## GSM684089 Arrays were scanned using an Affymetrix GeneChip Scanner 3000. These scans were used to generate CEL files for each array.
## GSM684090 Arrays were scanned using an Affymetrix GeneChip Scanner 3000. These scans were used to generate CEL files for each array.
## GSM684091 Arrays were scanned using an Affymetrix GeneChip Scanner 3000. These scans were used to generate CEL files for each array.
## GSM684092 Arrays were scanned using an Affymetrix GeneChip Scanner 3000. These scans were used to generate CEL files for each array.
## GSM684093 Arrays were scanned using an Affymetrix GeneChip Scanner 3000. These scans were used to generate CEL files for each array.
## GSM684094 Arrays were scanned using an Affymetrix GeneChip Scanner 3000. These scans were used to generate CEL files for each array.
##                                                                                                                                                                                                                                                               description
## GSM684089 RNA was treated for removal of ribosomal RNA, from which cDNA was synthesized, biotin-labeled, and hybridized to Human Exon 1.0 ST GeneChips. Transcript level data was obtained with RMA and the Affymetrix CDF using Expression Console Version 1.0 software.
## GSM684090 RNA was treated for removal of ribosomal RNA, from which cDNA was synthesized, biotin-labeled, and hybridized to Human Exon 1.0 ST GeneChips. Transcript level data was obtained with RMA and the Affymetrix CDF using Expression Console Version 1.0 software.
## GSM684091 RNA was treated for removal of ribosomal RNA, from which cDNA was synthesized, biotin-labeled, and hybridized to Human Exon 1.0 ST GeneChips. Transcript level data was obtained with RMA and the Affymetrix CDF using Expression Console Version 1.0 software.
## GSM684092 RNA was treated for removal of ribosomal RNA, from which cDNA was synthesized, biotin-labeled, and hybridized to Human Exon 1.0 ST GeneChips. Transcript level data was obtained with RMA and the Affymetrix CDF using Expression Console Version 1.0 software.
## GSM684093 RNA was treated for removal of ribosomal RNA, from which cDNA was synthesized, biotin-labeled, and hybridized to Human Exon 1.0 ST GeneChips. Transcript level data was obtained with RMA and the Affymetrix CDF using Expression Console Version 1.0 software.
## GSM684094 RNA was treated for removal of ribosomal RNA, from which cDNA was synthesized, biotin-labeled, and hybridized to Human Exon 1.0 ST GeneChips. Transcript level data was obtained with RMA and the Affymetrix CDF using Expression Console Version 1.0 software.
##                                                                                                                                                                                                                                                                                                                                                                                    data_processing
## GSM684089 Expression Console Version 1.1 (Affymetrix Inc.) was used to generate transcript-level gene expression estimates via the robust multichip average (RMA) algorithm using the Affymetrix CDF (Gene Level - Core: RMA Sketch).  Gene symbols of transcript ids were retrieved using DAVID (http://david.abcc.ncifcrf.gov/). No filtering of genes was performed before statistical testing.
## GSM684090 Expression Console Version 1.1 (Affymetrix Inc.) was used to generate transcript-level gene expression estimates via the robust multichip average (RMA) algorithm using the Affymetrix CDF (Gene Level - Core: RMA Sketch).  Gene symbols of transcript ids were retrieved using DAVID (http://david.abcc.ncifcrf.gov/). No filtering of genes was performed before statistical testing.
## GSM684091 Expression Console Version 1.1 (Affymetrix Inc.) was used to generate transcript-level gene expression estimates via the robust multichip average (RMA) algorithm using the Affymetrix CDF (Gene Level - Core: RMA Sketch).  Gene symbols of transcript ids were retrieved using DAVID (http://david.abcc.ncifcrf.gov/). No filtering of genes was performed before statistical testing.
## GSM684092 Expression Console Version 1.1 (Affymetrix Inc.) was used to generate transcript-level gene expression estimates via the robust multichip average (RMA) algorithm using the Affymetrix CDF (Gene Level - Core: RMA Sketch).  Gene symbols of transcript ids were retrieved using DAVID (http://david.abcc.ncifcrf.gov/). No filtering of genes was performed before statistical testing.
## GSM684093 Expression Console Version 1.1 (Affymetrix Inc.) was used to generate transcript-level gene expression estimates via the robust multichip average (RMA) algorithm using the Affymetrix CDF (Gene Level - Core: RMA Sketch).  Gene symbols of transcript ids were retrieved using DAVID (http://david.abcc.ncifcrf.gov/). No filtering of genes was performed before statistical testing.
## GSM684094 Expression Console Version 1.1 (Affymetrix Inc.) was used to generate transcript-level gene expression estimates via the robust multichip average (RMA) algorithm using the Affymetrix CDF (Gene Level - Core: RMA Sketch).  Gene symbols of transcript ids were retrieved using DAVID (http://david.abcc.ncifcrf.gov/). No filtering of genes was performed before statistical testing.
##                                                                                                data_processing.1
## GSM684089 Quantile sketch normalized transcript level expression using the RMA algorithm with the Affymetrix CDF
## GSM684090 Quantile sketch normalized transcript level expression using the RMA algorithm with the Affymetrix CDF
## GSM684091 Quantile sketch normalized transcript level expression using the RMA algorithm with the Affymetrix CDF
## GSM684092 Quantile sketch normalized transcript level expression using the RMA algorithm with the Affymetrix CDF
## GSM684093 Quantile sketch normalized transcript level expression using the RMA algorithm with the Affymetrix CDF
## GSM684094 Quantile sketch normalized transcript level expression using the RMA algorithm with the Affymetrix CDF
##           platform_id          contact_name contact_email contact_institute
## GSM684089     GPL5175 Joshua,David,Campbell   camp@bu.edu Boston University
## GSM684090     GPL5175 Joshua,David,Campbell   camp@bu.edu Boston University
## GSM684091     GPL5175 Joshua,David,Campbell   camp@bu.edu Boston University
## GSM684092     GPL5175 Joshua,David,Campbell   camp@bu.edu Boston University
## GSM684093     GPL5175 Joshua,David,Campbell   camp@bu.edu Boston University
## GSM684094     GPL5175 Joshua,David,Campbell   camp@bu.edu Boston University
##                      contact_address contact_city contact_state
## GSM684089 72 East Concord St., E-632       Boston            MA
## GSM684090 72 East Concord St., E-632       Boston            MA
## GSM684091 72 East Concord St., E-632       Boston            MA
## GSM684092 72 East Concord St., E-632       Boston            MA
## GSM684093 72 East Concord St., E-632       Boston            MA
## GSM684094 72 East Concord St., E-632       Boston            MA
##           contact_zip/postal_code contact_country
## GSM684089                   02118             USA
## GSM684090                   02118             USA
## GSM684091                   02118             USA
## GSM684092                   02118             USA
## GSM684093                   02118             USA
## GSM684094                   02118             USA
##                                                                          supplementary_file
## GSM684089 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM684nnn/GSM684089/suppl/GSM684089.CEL.gz
## GSM684090 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM684nnn/GSM684090/suppl/GSM684090.CEL.gz
## GSM684091 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM684nnn/GSM684091/suppl/GSM684091.CEL.gz
## GSM684092 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM684nnn/GSM684092/suppl/GSM684092.CEL.gz
## GSM684093 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM684nnn/GSM684093/suppl/GSM684093.CEL.gz
## GSM684094 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM684nnn/GSM684094/suppl/GSM684094.CEL.gz
##           data_row_count age:ch1 copd status:ch1      lm:ch1 notes:ch1
## GSM684089          22011      62               1 870.7318499      none
## GSM684090          22011      62               1 639.6153396      none
## GSM684091          22011      62               1 982.9172824      none
## GSM684092          22011      62               1  801.976686      none
## GSM684093          22011      62               1 726.5157591      none
## GSM684094          22011      62               1  663.336977      none
##           pack years:ch1 patient:ch1 Sex:ch1 slice:ch1
## GSM684089             50        6965    Male         3
## GSM684090             50        6965    Male         4
## GSM684091             50        6965    Male         5
## GSM684092             50        6965    Male         6
## GSM684093             50        6965    Male         7
## GSM684094             50        6965    Male         8

Time for this code chunk: 0.0216760635375977

Names will be different but it is important to check that “Control” group is the first level. If need it re-level groups.

pData(geo3)["Disease"] <- pData(geo3)[,"characteristics_ch1.3"]

table(pData(geo3)$Disease)
## 
## copd status: 0 copd status: 1 
##             16             48

Time for this code chunk: 0.0159335136413574

Differential expression analysis

Using DE() function (described above), we performed a lineal regression model to calculate the logarithm fold change of all genes between a “Control” and a “COPD” group. We also rename colnames adding the GSE ID at the end and finally, we save the output in a .CSV file.

de3 <- DE(geo3)
## [1] "(Intercept)"           "Diseasecopd status: 1"

colnames(de3) <- str_c(colnames(de3),"_",gse3)
colnames(de3)
##  [1] "rownames_GSE27597"        "ID_GSE27597"             
##  [3] "GB_LIST_GSE27597"         "SPOT_ID_GSE27597"        
##  [5] "seqname_GSE27597"         "RANGE_GB_GSE27597"       
##  [7] "RANGE_STRAND_GSE27597"    "RANGE_START_GSE27597"    
##  [9] "RANGE_STOP_GSE27597"      "total_probes_GSE27597"   
## [11] "gene_assignment_GSE27597" "mrna_assignment_GSE27597"
## [13] "category_GSE27597"        "logFC_GSE27597"          
## [15] "CI.L_GSE27597"            "CI.R_GSE27597"           
## [17] "AveExpr_GSE27597"         "t_GSE27597"              
## [19] "P.Value_GSE27597"         "adj.P.Val_GSE27597"      
## [21] "B_GSE27597"
write_csv(de3,
          path=str_c(OUTPUT_DIR,"/TableGenes_",gse3,"_",TODAY,".csv")
          )

Time for this code chunk: 1.68914937973022

GSE37768

The aim of this experiment was to identify genes deferential regulated between Normal and COPD lungs. They used two control group, one smokers and another one non smokers.

The experiment doesn’t have a related article but these two (https://www.sciencedirect.com/science/article/pii/S1094553910001240?via%3Dihub#fig5 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4022517/) describe a very similar design and have one of the contact contributors from the GEO experiment.

There are 9 non smokers, 11 smokers and 18 COPD lung tissue, but no more information is available or traceable about the samples.

Pre-process raw data

We pre-processed raw data using the function rawCEL_normCEL, plots will be shown as additional output.

gse4<- rownames(gse_table)[4] 
norm4 <- rawCEL_normCEL(gse4)
## Loading required package: pd.hg.u133.plus.2
## Platform design info loaded.
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927630_NS1-CEL1.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927631_NS2-CEL10.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927632_NS3-CEL11.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927633_NS4-CEL12.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927634_NS5-CEL13.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927635_NS6-CEL20.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927636_NS7-CEL27.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927637_NS8-CEL34.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927638_NS9-CEL28.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927639_S1-CEL14.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927640_S2-CEL15.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927641_S3-CEL16.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927642_S4-CEL21.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927643_S5-CEL22.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927644_S6-CEL23.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927645_S7-CEL30.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927646_S8-CEL31.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927647_S9-CEL35.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927648_S10-CEL36.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927649_S11-CEL37.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927650_COPD1-CEL2.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927651_COPD2-CEL3.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927652_COPD3-CEL4.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927653_COPD4-CEL5.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927654_COPD5-CEL6.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927655_COPD6-CEL7.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927656_COPD7-CEL8.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927657_COPD8-CEL9.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927658_COPD9-CEL17.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927659_COPD10-CEL18.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927660_COPD11-CEL19.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927661_COPD12-CEL24.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927662_COPD13-CEL25.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927663_COPD14-CEL26.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927664_COPD15-CEL29.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927665_COPD16-CEL32.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927666_COPD17-CEL33.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE37768/GSM927667_COPD18-CEL38.CEL.gz

## Background correcting
## Normalizing
## Calculating Expression

norm4
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54675 features, 38 samples 
##   element names: exprs 
## protocolData
##   rowNames: GSM927630_NS1-CEL1.CEL.gz GSM927631_NS2-CEL10.CEL.gz ...
##     GSM927667_COPD18-CEL38.CEL.gz (38 total)
##   varLabels: exprs dates
##   varMetadata: labelDescription channel
## phenoData
##   rowNames: GSM927630_NS1-CEL1.CEL.gz GSM927631_NS2-CEL10.CEL.gz ...
##     GSM927667_COPD18-CEL38.CEL.gz (38 total)
##   varLabels: index
##   varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.hg.u133.plus.2

Time for this code chunk: 47.142466545105

Get annotation

We used GEOquery package to obtain sample annotations and our previous calculated pre-processed values to create an ExpressionSet object.

# get annotation using GEOquery package
geo4 <- get_GEO(gse4,norm4)
## Found 1 file(s)
## GSE37768_series_matrix.txt.gz
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   ID_REF = col_character()
## )
## See spec(...) for full column specifications.
## File stored at:
## /tmp/Rtmp4ZNwux/GPL570.soft
## Data downloaded from GEOquery:
## $GSE37768_series_matrix.txt.gz
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54675 features, 38 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM927630 GSM927631 ... GSM927667 (38 total)
##   varLabels: title geo_accession ... tissue:ch1 (33 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL570
## Colnames of GEOquery object:
## [1] "GSM927630" "GSM927631" "GSM927632" "GSM927633" "GSM927634"
## Colnames of calculated pre-processed data:
## [1] "GSM927630_NS1-CEL1.CEL.gz"  "GSM927631_NS2-CEL10.CEL.gz"
## [3] "GSM927632_NS3-CEL11.CEL.gz" "GSM927633_NS4-CEL12.CEL.gz"
## [5] "GSM927634_NS5-CEL13.CEL.gz"

Time for this code chunk: 17.9869315624237

Select column with COPD description

Each experiment has its own annotation and we needed to look for a column describing which sample is a “Control” and which one is “COPD”.

head(pData(geo4))
##                     title geo_accession                status submission_date
## GSM927630 Lung tissue_NS1     GSM927630 Public on Aug 25 2016     May 04 2012
## GSM927631 Lung tissue_NS2     GSM927631 Public on Aug 25 2016     May 04 2012
## GSM927632 Lung tissue_NS3     GSM927632 Public on Aug 25 2016     May 04 2012
## GSM927633 Lung tissue_NS4     GSM927633 Public on Aug 25 2016     May 04 2012
## GSM927634 Lung tissue_NS5     GSM927634 Public on Aug 25 2016     May 04 2012
## GSM927635 Lung tissue_NS6     GSM927635 Public on Aug 25 2016     May 04 2012
##           last_update_date type channel_count
## GSM927630      Aug 25 2016  RNA             1
## GSM927631      Aug 25 2016  RNA             1
## GSM927632      Aug 25 2016  RNA             1
## GSM927633      Aug 25 2016  RNA             1
## GSM927634      Aug 25 2016  RNA             1
## GSM927635      Aug 25 2016  RNA             1
##                              source_name_ch1 organism_ch1
## GSM927630 Peripheral lung tissue,  nonsmoker Homo sapiens
## GSM927631 Peripheral lung tissue,  nonsmoker Homo sapiens
## GSM927632 Peripheral lung tissue,  nonsmoker Homo sapiens
## GSM927633 Peripheral lung tissue,  nonsmoker Homo sapiens
## GSM927634 Peripheral lung tissue,  nonsmoker Homo sapiens
## GSM927635 Peripheral lung tissue,  nonsmoker Homo sapiens
##                      characteristics_ch1 characteristics_ch1.1 molecule_ch1
## GSM927630 tissue: Peripheral lung tissue  phenotype: Nonsmoker    total RNA
## GSM927631 tissue: Peripheral lung tissue  phenotype: Nonsmoker    total RNA
## GSM927632 tissue: Peripheral lung tissue  phenotype: Nonsmoker    total RNA
## GSM927633 tissue: Peripheral lung tissue  phenotype: Nonsmoker    total RNA
## GSM927634 tissue: Peripheral lung tissue  phenotype: Nonsmoker    total RNA
## GSM927635 tissue: Peripheral lung tissue  phenotype: Nonsmoker    total RNA
##                                                                                                                                                        extract_protocol_ch1
## GSM927630 RNA was prepared from grossly homogenous pieces of tissue (100-200 mg) with an RNeasy Midi kit (Qiagen, Valencia, CA)  according to the manufactures instructions
## GSM927631 RNA was prepared from grossly homogenous pieces of tissue (100-200 mg) with an RNeasy Midi kit (Qiagen, Valencia, CA)  according to the manufactures instructions
## GSM927632 RNA was prepared from grossly homogenous pieces of tissue (100-200 mg) with an RNeasy Midi kit (Qiagen, Valencia, CA)  according to the manufactures instructions
## GSM927633 RNA was prepared from grossly homogenous pieces of tissue (100-200 mg) with an RNeasy Midi kit (Qiagen, Valencia, CA)  according to the manufactures instructions
## GSM927634 RNA was prepared from grossly homogenous pieces of tissue (100-200 mg) with an RNeasy Midi kit (Qiagen, Valencia, CA)  according to the manufactures instructions
## GSM927635 RNA was prepared from grossly homogenous pieces of tissue (100-200 mg) with an RNeasy Midi kit (Qiagen, Valencia, CA)  according to the manufactures instructions
##           label_ch1
## GSM927630    biotin
## GSM927631    biotin
## GSM927632    biotin
## GSM927633    biotin
## GSM927634    biotin
## GSM927635    biotin
##                                                                                                                                                                                                                                                                       label_protocol_ch1
## GSM927630 cRNA amplification and biotin labeling was synthesized from 2ug cDNA by means of an in vitro transcription reaction in the presence of T7 RNA Polymerase and of biotinylated nucleotide analog/ribonucleotide mix (GeneChip IVT labeling Kit, Affymetrix Inc, Santa Clara, CA)
## GSM927631 cRNA amplification and biotin labeling was synthesized from 2ug cDNA by means of an in vitro transcription reaction in the presence of T7 RNA Polymerase and of biotinylated nucleotide analog/ribonucleotide mix (GeneChip IVT labeling Kit, Affymetrix Inc, Santa Clara, CA)
## GSM927632 cRNA amplification and biotin labeling was synthesized from 2ug cDNA by means of an in vitro transcription reaction in the presence of T7 RNA Polymerase and of biotinylated nucleotide analog/ribonucleotide mix (GeneChip IVT labeling Kit, Affymetrix Inc, Santa Clara, CA)
## GSM927633 cRNA amplification and biotin labeling was synthesized from 2ug cDNA by means of an in vitro transcription reaction in the presence of T7 RNA Polymerase and of biotinylated nucleotide analog/ribonucleotide mix (GeneChip IVT labeling Kit, Affymetrix Inc, Santa Clara, CA)
## GSM927634 cRNA amplification and biotin labeling was synthesized from 2ug cDNA by means of an in vitro transcription reaction in the presence of T7 RNA Polymerase and of biotinylated nucleotide analog/ribonucleotide mix (GeneChip IVT labeling Kit, Affymetrix Inc, Santa Clara, CA)
## GSM927635 cRNA amplification and biotin labeling was synthesized from 2ug cDNA by means of an in vitro transcription reaction in the presence of T7 RNA Polymerase and of biotinylated nucleotide analog/ribonucleotide mix (GeneChip IVT labeling Kit, Affymetrix Inc, Santa Clara, CA)
##           taxid_ch1
## GSM927630      9606
## GSM927631      9606
## GSM927632      9606
## GSM927633      9606
## GSM927634      9606
## GSM927635      9606
##                                                                                                                                                                                                                                                                                                   hyb_protocol
## GSM927630 Biotin-labeled cRNA was cleanup and quantified and subsequently it was fragmented by metal-induced hydrolysis (GeneChip Sample Cleanup Module, Affymetrix Inc). Hybridization was performed with GeneChip® Hybridization Wash and Stain Kit (Affymetrix Inc) according to manufactures instructions.
## GSM927631 Biotin-labeled cRNA was cleanup and quantified and subsequently it was fragmented by metal-induced hydrolysis (GeneChip Sample Cleanup Module, Affymetrix Inc). Hybridization was performed with GeneChip® Hybridization Wash and Stain Kit (Affymetrix Inc) according to manufactures instructions.
## GSM927632 Biotin-labeled cRNA was cleanup and quantified and subsequently it was fragmented by metal-induced hydrolysis (GeneChip Sample Cleanup Module, Affymetrix Inc). Hybridization was performed with GeneChip® Hybridization Wash and Stain Kit (Affymetrix Inc) according to manufactures instructions.
## GSM927633 Biotin-labeled cRNA was cleanup and quantified and subsequently it was fragmented by metal-induced hydrolysis (GeneChip Sample Cleanup Module, Affymetrix Inc). Hybridization was performed with GeneChip® Hybridization Wash and Stain Kit (Affymetrix Inc) according to manufactures instructions.
## GSM927634 Biotin-labeled cRNA was cleanup and quantified and subsequently it was fragmented by metal-induced hydrolysis (GeneChip Sample Cleanup Module, Affymetrix Inc). Hybridization was performed with GeneChip® Hybridization Wash and Stain Kit (Affymetrix Inc) according to manufactures instructions.
## GSM927635 Biotin-labeled cRNA was cleanup and quantified and subsequently it was fragmented by metal-induced hydrolysis (GeneChip Sample Cleanup Module, Affymetrix Inc). Hybridization was performed with GeneChip® Hybridization Wash and Stain Kit (Affymetrix Inc) according to manufactures instructions.
##                                                                        scan_protocol
## GSM927630 GeneChips were scanned using the GeneChip®Scanner 3000 7G (Affymetrix Inc)
## GSM927631 GeneChips were scanned using the GeneChip®Scanner 3000 7G (Affymetrix Inc)
## GSM927632 GeneChips were scanned using the GeneChip®Scanner 3000 7G (Affymetrix Inc)
## GSM927633 GeneChips were scanned using the GeneChip®Scanner 3000 7G (Affymetrix Inc)
## GSM927634 GeneChips were scanned using the GeneChip®Scanner 3000 7G (Affymetrix Inc)
## GSM927635 GeneChips were scanned using the GeneChip®Scanner 3000 7G (Affymetrix Inc)
##           description
## GSM927630     NS-CEL1
## GSM927631    NS-CEL10
## GSM927632    NS-CEL11
## GSM927633    NS-CEL12
## GSM927634    NS-CEL13
## GSM927635    NS-CEL20
##                                                                                                                            data_processing
## GSM927630 Expression measures were normalized and summarized using the GC content adjusted-Robust Multiarray Analysis (GC-RMA) methodology
## GSM927631 Expression measures were normalized and summarized using the GC content adjusted-Robust Multiarray Analysis (GC-RMA) methodology
## GSM927632 Expression measures were normalized and summarized using the GC content adjusted-Robust Multiarray Analysis (GC-RMA) methodology
## GSM927633 Expression measures were normalized and summarized using the GC content adjusted-Robust Multiarray Analysis (GC-RMA) methodology
## GSM927634 Expression measures were normalized and summarized using the GC content adjusted-Robust Multiarray Analysis (GC-RMA) methodology
## GSM927635 Expression measures were normalized and summarized using the GC content adjusted-Robust Multiarray Analysis (GC-RMA) methodology
##           platform_id    contact_name        contact_email
## GSM927630      GPL570 Ricardo,,Bastos rbastos@clinic.ub.es
## GSM927631      GPL570 Ricardo,,Bastos rbastos@clinic.ub.es
## GSM927632      GPL570 Ricardo,,Bastos rbastos@clinic.ub.es
## GSM927633      GPL570 Ricardo,,Bastos rbastos@clinic.ub.es
## GSM927634      GPL570 Ricardo,,Bastos rbastos@clinic.ub.es
## GSM927635      GPL570 Ricardo,,Bastos rbastos@clinic.ub.es
##                                   contact_department
## GSM927630 Cell Biology, Neurosciences and Immunology
## GSM927631 Cell Biology, Neurosciences and Immunology
## GSM927632 Cell Biology, Neurosciences and Immunology
## GSM927633 Cell Biology, Neurosciences and Immunology
## GSM927634 Cell Biology, Neurosciences and Immunology
## GSM927635 Cell Biology, Neurosciences and Immunology
##                         contact_institute   contact_address contact_city
## GSM927630 IDIBAPS/University of Barcelona Rosellón  149-153    Barcelona
## GSM927631 IDIBAPS/University of Barcelona Rosellón  149-153    Barcelona
## GSM927632 IDIBAPS/University of Barcelona Rosellón  149-153    Barcelona
## GSM927633 IDIBAPS/University of Barcelona Rosellón  149-153    Barcelona
## GSM927634 IDIBAPS/University of Barcelona Rosellón  149-153    Barcelona
## GSM927635 IDIBAPS/University of Barcelona Rosellón  149-153    Barcelona
##           contact_zip/postal_code contact_country
## GSM927630                   08036           Spain
## GSM927631                   08036           Spain
## GSM927632                   08036           Spain
## GSM927633                   08036           Spain
## GSM927634                   08036           Spain
## GSM927635                   08036           Spain
##                                                                                    supplementary_file
## GSM927630  ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM927nnn/GSM927630/suppl/GSM927630_NS1-CEL1.CEL.gz
## GSM927631 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM927nnn/GSM927631/suppl/GSM927631_NS2-CEL10.CEL.gz
## GSM927632 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM927nnn/GSM927632/suppl/GSM927632_NS3-CEL11.CEL.gz
## GSM927633 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM927nnn/GSM927633/suppl/GSM927633_NS4-CEL12.CEL.gz
## GSM927634 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM927nnn/GSM927634/suppl/GSM927634_NS5-CEL13.CEL.gz
## GSM927635 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM927nnn/GSM927635/suppl/GSM927635_NS6-CEL20.CEL.gz
##           data_row_count phenotype:ch1             tissue:ch1
## GSM927630          54675     Nonsmoker Peripheral lung tissue
## GSM927631          54675     Nonsmoker Peripheral lung tissue
## GSM927632          54675     Nonsmoker Peripheral lung tissue
## GSM927633          54675     Nonsmoker Peripheral lung tissue
## GSM927634          54675     Nonsmoker Peripheral lung tissue
## GSM927635          54675     Nonsmoker Peripheral lung tissue

Time for this code chunk: 0.0166659355163574

Names will be different but it is important to check that “Control” group is the first level. If need it re-level groups.

pData(geo4)["Disease"] <- factor(pData(geo4)[,"phenotype:ch1"])

table(pData(geo4)$Disease)
## 
## healthy smoker  moderate COPD      Nonsmoker 
##             11             18              9

Time for this code chunk: 0.0104644298553467

Differential expression analysis

Using DE() function (described above), we performed a lineal regression model to calculate the logarithm fold change of all genes between a “Control” and a “COPD” group. We also rename colnames adding the GSE ID at the end and finally, we save the output in a .CSV file.

de4 <- DE(geo4)
## [1] "(Intercept)"          "Diseasemoderate COPD" "DiseaseNonsmoker"

colnames(de4) <- str_c(colnames(de4),"_",gse4)
colnames(de4)
##  [1] "rownames_GSE37768"                        
##  [2] "ID_GSE37768"                              
##  [3] "GB_ACC_GSE37768"                          
##  [4] "SPOT_ID_GSE37768"                         
##  [5] "Species.Scientific.Name_GSE37768"         
##  [6] "Annotation.Date_GSE37768"                 
##  [7] "Sequence.Type_GSE37768"                   
##  [8] "Sequence.Source_GSE37768"                 
##  [9] "Target.Description_GSE37768"              
## [10] "Representative.Public.ID_GSE37768"        
## [11] "Gene.Title_GSE37768"                      
## [12] "Gene.Symbol_GSE37768"                     
## [13] "ENTREZ_GENE_ID_GSE37768"                  
## [14] "RefSeq.Transcript.ID_GSE37768"            
## [15] "Gene.Ontology.Biological.Process_GSE37768"
## [16] "Gene.Ontology.Cellular.Component_GSE37768"
## [17] "Gene.Ontology.Molecular.Function_GSE37768"
## [18] "logFC_GSE37768"                           
## [19] "CI.L_GSE37768"                            
## [20] "CI.R_GSE37768"                            
## [21] "AveExpr_GSE37768"                         
## [22] "t_GSE37768"                               
## [23] "P.Value_GSE37768"                         
## [24] "adj.P.Val_GSE37768"                       
## [25] "B_GSE37768"
write_csv(de4,
          path=str_c(OUTPUT_DIR,"/TableGenes_",gse4,"_",TODAY,".csv")
          )

Time for this code chunk: 5.08631730079651

GSE47460

This experiment has the data from [Lung Tissue Research Consortium (LTRC)] (https://ltrcpublic.com/). The data is lung tissue from 582 total subjects, 254 have interstitial lung disease, 220 have COPD, and 108 are controls but they are split it in two because the authors used two different Agilent platforms Agilent-014850 Whole Human Genome Microarray and Agilent-028004 SurePrint G3 Human.

The experiment is divided in two, 429 samples in one platform and 153 in another. They dont’ have a summary table of smoking status so we can find the information using the meta data from GEO.

Raw data

Agilent raw data was provided in .txt files, and was pre processed using limma package. The following script describes the pre processing and gene expression analysis for Agilent data.

gse5<- rownames(gse_table)[5] 

txt <- list.files(file.path(DATA_DIR,"celfiles",gse5))
txt <- data.frame(file=txt,X=gsub("_.*","",txt))

Time for this code chunk: 0.0131757259368896

Get annotation

The annotation can be found in targets.csv file

targets<- read.csv(file.path(DATA_DIR,"celfiles",gse5,"targets.csv"))
targets <- merge(targets,txt, by = "X")
head(targets)
##            X                disease.state.ch1 platform_id
## 1 GSM1149948                          Control    GPL14550
## 2 GSM1149949 Chronic Obstructive Lung Disease     GPL6480
## 3 GSM1149950        Interstitial lung disease    GPL14550
## 4 GSM1149951                          Control    GPL14550
## 5 GSM1149952        Interstitial lung disease    GPL14550
## 6 GSM1149953 Chronic Obstructive Lung Disease     GPL6480
##                             file
## 1 GSM1149948_LT000842RU_CTRL.txt
## 2 GSM1149949_LT001098RU_COPD.txt
## 3  GSM1149950_LT001600RL_ILD.txt
## 4 GSM1149951_LT001796RU_CTRL.txt
## 5  GSM1149952_LT004173LL_ILD.txt
## 6 GSM1149953_LT007392RU_COPD.txt
t1 <- targets[targets$platform_id == "GPL14550",]
head(t1)
##             X         disease.state.ch1 platform_id
## 1  GSM1149948                   Control    GPL14550
## 3  GSM1149950 Interstitial lung disease    GPL14550
## 4  GSM1149951                   Control    GPL14550
## 5  GSM1149952 Interstitial lung disease    GPL14550
## 11 GSM1149958                   Control    GPL14550
## 15 GSM1149962                   Control    GPL14550
##                              file
## 1  GSM1149948_LT000842RU_CTRL.txt
## 3   GSM1149950_LT001600RL_ILD.txt
## 4  GSM1149951_LT001796RU_CTRL.txt
## 5   GSM1149952_LT004173LL_ILD.txt
## 11 GSM1149958_LT022835RL_CTRL.txt
## 15 GSM1149962_LT026501RL_CTRL.txt

Time for this code chunk: 0.0215604305267334

But we will need to download gene annotation from GEOquery

gpl <- getGEO("GPL14550")
## File stored at:
## /tmp/Rtmp4ZNwux/GPL14550.soft
gpl <- Table(gpl)
head(gpl)
##                ID         SPOT_ID CONTROL_TYPE REFSEQ GB_ACC GENE GENE_SYMBOL
## 1    (+)E1A_r60_1    (+)E1A_r60_1          pos                 NA            
## 2    (+)E1A_r60_3    (+)E1A_r60_3          pos                 NA            
## 3 (+)E1A_r60_a104 (+)E1A_r60_a104          pos                 NA            
## 4 (+)E1A_r60_a107 (+)E1A_r60_a107          pos                 NA            
## 5 (+)E1A_r60_a135 (+)E1A_r60_a135          pos                 NA            
## 6  (+)E1A_r60_a20  (+)E1A_r60_a20          pos                 NA            
##   GENE_NAME UNIGENE_ID ENSEMBL_ID TIGR_ID ACCESSION_STRING CHROMOSOMAL_LOCATION
## 1                                      NA                                      
## 2                                      NA                                      
## 3                                      NA                                      
## 4                                      NA                                      
## 5                                      NA                                      
## 6                                      NA                                      
##   CYTOBAND DESCRIPTION GO_ID SEQUENCE
## 1                                    
## 2                                    
## 3                                    
## 4                                    
## 5                                    
## 6

Time for this code chunk: 3.56962299346924

Pre-process raw data

We first read files into R enviroment

dat1 = read.maimages(t1$file, path=file.path(DATA_DIR,"celfiles",gse5), source="agilent.median", green.only=T,
    columns=list(G="gMedianSignal", Gb="gBGMedianSignal"),
    annotation=c("Row", "Col", "ProbeName", "SystematicName")
)
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149948_LT000842RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149950_LT001600RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149951_LT001796RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149952_LT004173LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149958_LT022835RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149962_LT026501RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149963_LT028264RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149966_LT037781RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149971_LT045714LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149975_LT047679RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149981_LT052751LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149985_LT056464LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149987_LT058471RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149988_LT058983RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149992_LT061842RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149993_LT063974LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149996_LT071706RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1149998_LT073345RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150001_LT077355RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150010_LT092669LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150012_LT095376RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150014_LT097622RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150017_LT104535RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150030_LT113211RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150031_LT115873LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150032_LT117785RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150037_LT121655RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150040_LT122283RM_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150041_LT122757RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150053_LT132955RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150055_LT136731LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150056_LT137431RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150067_LT157253RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150070_LT160423LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150075_LT167208LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150078_LT171169LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150080_LT172184LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150081_LT174005RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150085_LT181242RM_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150089_LT187987LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150090_LT188012RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150100_LT205601LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150119_LT258372RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150122_LT264690LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150143_LT003990RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150144_LT010012LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150145_LT012861RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150146_LT023631RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150147_LT030041RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150148_LT044225RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150149_LT046027RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150150_LT055745RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150151_LT058156LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150152_LT075094LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150153_LT077800RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150154_LT089451RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150155_LT109097LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150156_LT115840LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150157_LT120371LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150158_LT132314LI_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150159_LT141224LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150160_LT141870LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150161_LT148286LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150162_LT155982RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150163_LT158011LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150164_LT159988RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150165_LT163513RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150166_LT168204LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150167_LT177521RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150168_LT178307RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150169_LT188161LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150170_LT194990RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150171_LT198741LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150172_LT231101RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150173_LT231373LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150174_LT280560LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150175_LT280646RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150176_LT286300LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150177_LT295133LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150178_LT002501RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150179_LT002902RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150180_LT007259RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150181_LT008331RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150182_LT017275LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150183_LT020259LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150184_LT020426LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150185_LT024106RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150186_LT024460RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150187_LT025997RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150188_LT028044RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150189_LT028427LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150190_LT030151RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150191_LT033422RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150192_LT034070LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150193_LT034821RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150194_LT035239LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150195_LT035774RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150196_LT039091RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150197_LT041723RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150198_LT042151RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150199_LT042552RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150200_LT043343LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150201_LT043798LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150202_LT057312LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150203_LT057972LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150204_LT058691LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150205_LT059736LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150206_LT059975LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150207_LT061106RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150208_LT062141RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150209_LT067836RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150210_LT070403LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150211_LT072387LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150212_LT072808RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150213_LT075462RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150214_LT076181LI_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150215_LT076617LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150216_LT077082RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150217_LT077317LI_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150218_LT078404RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150219_LT078696LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150220_LT080176RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150221_LT080836RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150222_LT082092RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150223_LT083706RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150224_LT083759RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150225_LT084406RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150226_LT084808LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150227_LT087663RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150228_LT089723LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150229_LT094217RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150230_LT094532RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150231_LT098394RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150232_LT100707RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150233_LT103266RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150234_LT109231RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150235_LT111643RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150236_LT112563LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150237_LT112597RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150238_LT113005RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150239_LT115251RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150240_LT118064RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150241_LT118801RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150242_LT122336LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150243_LT126327LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150244_LT130861RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150245_LT134279LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150246_LT134719RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150247_LT134829RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150248_LT136415RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150249_LT137832LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150250_LT138418LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150251_LT139051LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150252_LT139601RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150253_LT140046RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150254_LT140471RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150255_LT148377LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150256_LT148511LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150257_LT151255LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150258_LT151920RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150259_LT151949RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150260_LT152615RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150261_LT152653LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150262_LT154785RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150263_LT156041LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150264_LT156276RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150265_LT157177RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150266_LT158647RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150267_LT161434RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150268_LT162479RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150269_LT163771RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150270_LT165736LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150271_LT167064RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150272_LT168128RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150273_LT170158LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150274_LT175949LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150275_LT176562LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150276_LT177956LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150277_LT178790LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150278_LT178929RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150279_LT178967RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150280_LT180781RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150281_LT182636RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150282_LT184241RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150283_LT184772RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150284_LT185970RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150285_LT186521RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150286_LT188524RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150287_LT189721RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150288_LT190004RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150289_LT191087RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150290_LT192758RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150291_LT194473RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150292_LT195188RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150293_LT195871RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150294_LT197511LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150295_LT198062LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150296_LT198134LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150297_LT198612RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150298_LT199384RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150299_LT199987LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150300_LT203231RM_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150301_LT203541RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150302_LT206005RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150303_LT208505LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150304_LT208778RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150305_LT210463LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150306_LT211379RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150307_LT212777RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150308_LT213352LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150309_LT213735RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150310_LT214473RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150311_LT216419RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150312_LT220968RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150313_LT223474RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150314_LT228241RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150315_LT230415RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150316_LT233620RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150317_LT236710RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150318_LT238531RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150319_LT239116RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150321_LT242119LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150331_LT242530RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150332_LT243058RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150334_LT244480LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150336_LT245031LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150338_LT245084RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150339_LT245840RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150341_LT246702RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150343_LT249917LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150344_LT255244RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150346_LT255718RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150347_LT256532LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150349_LT257433RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150351_LT261141RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150352_LT262371RM_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150353_LT263636RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150354_LT266802RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150355_LT268509LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150357_LT270247RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150358_LT271100LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150359_LT273284LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150361_LT277002LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150362_LT286056RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150363_LT287158LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150364_LT298520RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150365_LT000216LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150366_LT000379LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150367_LT002410RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150368_LT005256RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150369_LT005419RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150370_LT006946RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150371_LT011501RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150372_LT012933RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150373_LT013011LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150374_LT017495RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150375_LT017533RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150376_LT017811RM_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150377_LT019699RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150378_LT021461RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150379_LT021748RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150380_LT022251RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150381_LT022271LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150382_LT022562LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150383_LT024967LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150384_LT026534RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150385_LT030347RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150386_LT030662LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150387_LT036383LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150388_LT037710RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150389_LT041389RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150390_LT046539RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150391_LT047152LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150392_LT050079RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150393_LT053283RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150394_LT058319LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150395_LT059721LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150396_LT060717LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150397_LT062121LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150398_LT067200LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150399_LT069585RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150400_LT076421LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150401_LT077264RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150402_LT078347LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150403_LT079487LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150404_LT080750LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150405_LT081282RM_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150406_LT081498RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150407_LT082461RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150408_LT083950RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150409_LT084038RM_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150410_LT087826LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150411_LT090666LI_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150412_LT091552LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150413_LT093297LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150414_LT095342LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150415_LT099879RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150416_LT100821RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150417_LT102131RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150418_LT102695LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150419_LT108067RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150420_LT111916LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150421_LT116004RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150422_LT118629RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150423_LT119682RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150424_LT123131RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150425_LT123552RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150426_LT124161RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150427_LT126767LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150428_LT128191LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150429_LT130603LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150430_LT134121RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150431_LT134776LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150432_LT135390LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150433_LT139649RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150434_LT139691RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150435_LT142159RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150436_LT143944LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150437_LT144769RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150438_LT145196LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150439_LT147610LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150440_LT151370RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150441_LT151513LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150442_LT155318RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150443_LT156171LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150444_LT156481LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150445_LT157856LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150446_LT158795LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150447_LT159753LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150448_LT162096RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150449_LT163384LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150450_LT165114RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150451_LT166111RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150452_LT166240RM_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150453_LT167891LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150454_LT167906LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150455_LT168094RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150456_LT168902RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150457_LT171097RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150458_LT172093RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150459_LT173597LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150460_LT173946RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150461_LT175399RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150462_LT176510LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150463_LT180102RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150464_LT184423LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150465_LT184901RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150466_LT185396RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150467_LT186388LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150468_LT190870RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150469_LT191618LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150470_LT191675RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150471_LT195011RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150472_LT195207LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150473_LT195484RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150474_LT195522RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150475_LT196309RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150476_LT196677RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150477_LT197381LI_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150478_LT197821RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150479_LT200930RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150480_LT201348LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150481_LT201831RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150482_LT204935LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150483_LT206871LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150484_LT207073RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150485_LT213606RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150486_LT215341RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150487_LT220661RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150488_LT221381RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150489_LT221687RM_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150490_LT221782LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150491_LT221983RM_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150492_LT223106RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150493_LT227135RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150494_LT228772LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150495_LT228897LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150496_LT229669RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150497_LT232073LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150498_LT232107RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150499_LT233821RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150500_LT234205RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150501_LT234755LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150502_LT234774LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150503_LT235441RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150504_LT236045LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150505_LT236519LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150506_LT236557RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150507_LT237439RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150508_LT238765RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150509_LT241811LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150510_LT242161RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150511_LT243794RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150512_LT244399LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150513_LT244824RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150514_LT245983LU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150515_LT246349LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150516_LT246774RM_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150517_LT247728RL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150518_LT248906LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150519_LT249481LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150520_LT249811RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150521_LT251693LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150522_LT251947LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150523_LT253131RU_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150524_LT253371RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150525_LT253677RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150526_LT256221LL_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150527_LT256920RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150528_LT259072LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150529_LT264283RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150530_LT266817LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150531_LT270821RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150532_LT277811RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150533_LT279828RU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150534_LT280282RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150535_LT280851LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150536_LT282601LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150537_LT285031LU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150538_LT285671LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150539_LT285906LL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150540_LT286644RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150541_LT287196LL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150542_LT288719RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150543_LT290677RU_CTRL.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150544_LT291449RL_COPD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150545_LT295717RL_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150546_LT297451LU_ILD.txt 
## Read /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE47460/GSM1150547_LT299181LU_ILD.txt

Time for this code chunk: 2.1515544851621

Then we plot raw data (in this case, I’m not evaluating this chunk because it is killing the renderization)

boxplot(dat1$E)
hist(dat1$E)

Time for this code chunk: 0.000694990158081055

Normalizing data using Quantile and calculating log2 values.

dat1 <- backgroundCorrect(dat1, method="normexp", offset=1)
## Array 1 corrected
## Array 2 corrected
## Array 3 corrected
## Array 4 corrected
## Array 5 corrected
## Array 6 corrected
## Array 7 corrected
## Array 8 corrected
## Array 9 corrected
## Array 10 corrected
## Array 11 corrected
## Array 12 corrected
## Array 13 corrected
## Array 14 corrected
## Array 15 corrected
## Array 16 corrected
## Array 17 corrected
## Array 18 corrected
## Array 19 corrected
## Array 20 corrected
## Array 21 corrected
## Array 22 corrected
## Array 23 corrected
## Array 24 corrected
## Array 25 corrected
## Array 26 corrected
## Array 27 corrected
## Array 28 corrected
## Array 29 corrected
## Array 30 corrected
## Array 31 corrected
## Array 32 corrected
## Array 33 corrected
## Array 34 corrected
## Array 35 corrected
## Array 36 corrected
## Array 37 corrected
## Array 38 corrected
## Array 39 corrected
## Array 40 corrected
## Array 41 corrected
## Array 42 corrected
## Array 43 corrected
## Array 44 corrected
## Array 45 corrected
## Array 46 corrected
## Array 47 corrected
## Array 48 corrected
## Array 49 corrected
## Array 50 corrected
## Array 51 corrected
## Array 52 corrected
## Array 53 corrected
## Array 54 corrected
## Array 55 corrected
## Array 56 corrected
## Array 57 corrected
## Array 58 corrected
## Array 59 corrected
## Array 60 corrected
## Array 61 corrected
## Array 62 corrected
## Array 63 corrected
## Array 64 corrected
## Array 65 corrected
## Array 66 corrected
## Array 67 corrected
## Array 68 corrected
## Array 69 corrected
## Array 70 corrected
## Array 71 corrected
## Array 72 corrected
## Array 73 corrected
## Array 74 corrected
## Array 75 corrected
## Array 76 corrected
## Array 77 corrected
## Array 78 corrected
## Array 79 corrected
## Array 80 corrected
## Array 81 corrected
## Array 82 corrected
## Array 83 corrected
## Array 84 corrected
## Array 85 corrected
## Array 86 corrected
## Array 87 corrected
## Array 88 corrected
## Array 89 corrected
## Array 90 corrected
## Array 91 corrected
## Array 92 corrected
## Array 93 corrected
## Array 94 corrected
## Array 95 corrected
## Array 96 corrected
## Array 97 corrected
## Array 98 corrected
## Array 99 corrected
## Array 100 corrected
## Array 101 corrected
## Array 102 corrected
## Array 103 corrected
## Array 104 corrected
## Array 105 corrected
## Array 106 corrected
## Array 107 corrected
## Array 108 corrected
## Array 109 corrected
## Array 110 corrected
## Array 111 corrected
## Array 112 corrected
## Array 113 corrected
## Array 114 corrected
## Array 115 corrected
## Array 116 corrected
## Array 117 corrected
## Array 118 corrected
## Array 119 corrected
## Array 120 corrected
## Array 121 corrected
## Array 122 corrected
## Array 123 corrected
## Array 124 corrected
## Array 125 corrected
## Array 126 corrected
## Array 127 corrected
## Array 128 corrected
## Array 129 corrected
## Array 130 corrected
## Array 131 corrected
## Array 132 corrected
## Array 133 corrected
## Array 134 corrected
## Array 135 corrected
## Array 136 corrected
## Array 137 corrected
## Array 138 corrected
## Array 139 corrected
## Array 140 corrected
## Array 141 corrected
## Array 142 corrected
## Array 143 corrected
## Array 144 corrected
## Array 145 corrected
## Array 146 corrected
## Array 147 corrected
## Array 148 corrected
## Array 149 corrected
## Array 150 corrected
## Array 151 corrected
## Array 152 corrected
## Array 153 corrected
## Array 154 corrected
## Array 155 corrected
## Array 156 corrected
## Array 157 corrected
## Array 158 corrected
## Array 159 corrected
## Array 160 corrected
## Array 161 corrected
## Array 162 corrected
## Array 163 corrected
## Array 164 corrected
## Array 165 corrected
## Array 166 corrected
## Array 167 corrected
## Array 168 corrected
## Array 169 corrected
## Array 170 corrected
## Array 171 corrected
## Array 172 corrected
## Array 173 corrected
## Array 174 corrected
## Array 175 corrected
## Array 176 corrected
## Array 177 corrected
## Array 178 corrected
## Array 179 corrected
## Array 180 corrected
## Array 181 corrected
## Array 182 corrected
## Array 183 corrected
## Array 184 corrected
## Array 185 corrected
## Array 186 corrected
## Array 187 corrected
## Array 188 corrected
## Array 189 corrected
## Array 190 corrected
## Array 191 corrected
## Array 192 corrected
## Array 193 corrected
## Array 194 corrected
## Array 195 corrected
## Array 196 corrected
## Array 197 corrected
## Array 198 corrected
## Array 199 corrected
## Array 200 corrected
## Array 201 corrected
## Array 202 corrected
## Array 203 corrected
## Array 204 corrected
## Array 205 corrected
## Array 206 corrected
## Array 207 corrected
## Array 208 corrected
## Array 209 corrected
## Array 210 corrected
## Array 211 corrected
## Array 212 corrected
## Array 213 corrected
## Array 214 corrected
## Array 215 corrected
## Array 216 corrected
## Array 217 corrected
## Array 218 corrected
## Array 219 corrected
## Array 220 corrected
## Array 221 corrected
## Array 222 corrected
## Array 223 corrected
## Array 224 corrected
## Array 225 corrected
## Array 226 corrected
## Array 227 corrected
## Array 228 corrected
## Array 229 corrected
## Array 230 corrected
## Array 231 corrected
## Array 232 corrected
## Array 233 corrected
## Array 234 corrected
## Array 235 corrected
## Array 236 corrected
## Array 237 corrected
## Array 238 corrected
## Array 239 corrected
## Array 240 corrected
## Array 241 corrected
## Array 242 corrected
## Array 243 corrected
## Array 244 corrected
## Array 245 corrected
## Array 246 corrected
## Array 247 corrected
## Array 248 corrected
## Array 249 corrected
## Array 250 corrected
## Array 251 corrected
## Array 252 corrected
## Array 253 corrected
## Array 254 corrected
## Array 255 corrected
## Array 256 corrected
## Array 257 corrected
## Array 258 corrected
## Array 259 corrected
## Array 260 corrected
## Array 261 corrected
## Array 262 corrected
## Array 263 corrected
## Array 264 corrected
## Array 265 corrected
## Array 266 corrected
## Array 267 corrected
## Array 268 corrected
## Array 269 corrected
## Array 270 corrected
## Array 271 corrected
## Array 272 corrected
## Array 273 corrected
## Array 274 corrected
## Array 275 corrected
## Array 276 corrected
## Array 277 corrected
## Array 278 corrected
## Array 279 corrected
## Array 280 corrected
## Array 281 corrected
## Array 282 corrected
## Array 283 corrected
## Array 284 corrected
## Array 285 corrected
## Array 286 corrected
## Array 287 corrected
## Array 288 corrected
## Array 289 corrected
## Array 290 corrected
## Array 291 corrected
## Array 292 corrected
## Array 293 corrected
## Array 294 corrected
## Array 295 corrected
## Array 296 corrected
## Array 297 corrected
## Array 298 corrected
## Array 299 corrected
## Array 300 corrected
## Array 301 corrected
## Array 302 corrected
## Array 303 corrected
## Array 304 corrected
## Array 305 corrected
## Array 306 corrected
## Array 307 corrected
## Array 308 corrected
## Array 309 corrected
## Array 310 corrected
## Array 311 corrected
## Array 312 corrected
## Array 313 corrected
## Array 314 corrected
## Array 315 corrected
## Array 316 corrected
## Array 317 corrected
## Array 318 corrected
## Array 319 corrected
## Array 320 corrected
## Array 321 corrected
## Array 322 corrected
## Array 323 corrected
## Array 324 corrected
## Array 325 corrected
## Array 326 corrected
## Array 327 corrected
## Array 328 corrected
## Array 329 corrected
## Array 330 corrected
## Array 331 corrected
## Array 332 corrected
## Array 333 corrected
## Array 334 corrected
## Array 335 corrected
## Array 336 corrected
## Array 337 corrected
## Array 338 corrected
## Array 339 corrected
## Array 340 corrected
## Array 341 corrected
## Array 342 corrected
## Array 343 corrected
## Array 344 corrected
## Array 345 corrected
## Array 346 corrected
## Array 347 corrected
## Array 348 corrected
## Array 349 corrected
## Array 350 corrected
## Array 351 corrected
## Array 352 corrected
## Array 353 corrected
## Array 354 corrected
## Array 355 corrected
## Array 356 corrected
## Array 357 corrected
## Array 358 corrected
## Array 359 corrected
## Array 360 corrected
## Array 361 corrected
## Array 362 corrected
## Array 363 corrected
## Array 364 corrected
## Array 365 corrected
## Array 366 corrected
## Array 367 corrected
## Array 368 corrected
## Array 369 corrected
## Array 370 corrected
## Array 371 corrected
## Array 372 corrected
## Array 373 corrected
## Array 374 corrected
## Array 375 corrected
## Array 376 corrected
## Array 377 corrected
## Array 378 corrected
## Array 379 corrected
## Array 380 corrected
## Array 381 corrected
## Array 382 corrected
## Array 383 corrected
## Array 384 corrected
## Array 385 corrected
## Array 386 corrected
## Array 387 corrected
## Array 388 corrected
## Array 389 corrected
## Array 390 corrected
## Array 391 corrected
## Array 392 corrected
## Array 393 corrected
## Array 394 corrected
## Array 395 corrected
## Array 396 corrected
## Array 397 corrected
## Array 398 corrected
## Array 399 corrected
## Array 400 corrected
## Array 401 corrected
## Array 402 corrected
## Array 403 corrected
## Array 404 corrected
## Array 405 corrected
## Array 406 corrected
## Array 407 corrected
## Array 408 corrected
## Array 409 corrected
## Array 410 corrected
## Array 411 corrected
## Array 412 corrected
## Array 413 corrected
## Array 414 corrected
## Array 415 corrected
## Array 416 corrected
## Array 417 corrected
## Array 418 corrected
## Array 419 corrected
## Array 420 corrected
## Array 421 corrected
## Array 422 corrected
## Array 423 corrected
## Array 424 corrected
## Array 425 corrected
## Array 426 corrected
## Array 427 corrected
## Array 428 corrected
## Array 429 corrected
dat1$E <- normalizeBetweenArrays(dat1$E, method="quantile")
dat1$E <- log2(dat1$E)

E = new("MAList", list(targets=dat1$targets, genes=dat1$genes, source=dat1$source, M=dat1$E, A=dat1$E))
E.avg <- avereps(E, ID=E$genes$ProbeName)

Time for this code chunk: 2.68028266429901

boxplot(dat1$E)

hist(dat1$E)

Time for this code chunk: 17.652147769928

Select column with COPD description

Each experiment has its own annotation and we needed to look for a column describing which sample is a “Control” and which one is “COPD”.

Names will be different but it is important to check that “Control” group is the first level. If need it re-level groups.

t1$disease.state.ch1 <- as.factor(t1$disease.state.ch1)
t1$disease.state.ch1 <- relevel(t1$disease.state.ch1,ref = "Control")

table(t1$disease.state.ch1)
## 
##                          Control Chronic Obstructive Lung Disease 
##                               91                              145 
##        Interstitial lung disease 
##                              193

Time for this code chunk: 0.00943946838378906

Differential expression analysis

Using DE() function (described above), we performed a lineal regression model to calculate the logarithm fold change of all genes between a “Control” and a “COPD” group. We also rename colnames adding the GSE ID at the end and finally, we save the output in a .CSV file.

fit <- lmFit(E.avg$A,  model.matrix(~1 + t1$disease.state.ch1))
# eBayes in lmFit model
ebf <- eBayes(fit)
print(colnames(coef(fit)))
## [1] "(Intercept)"                                         
## [2] "t1$disease.state.ch1Chronic Obstructive Lung Disease"
## [3] "t1$disease.state.ch1Interstitial lung disease"
coeff ="t1$disease.state.ch1Chronic Obstructive Lung Disease"
# It gets the genes with the p-values
res <- topTable(ebf, number = Inf, p.value = 1, coef = coeff,confint=T)

volcanoplot(ebf,coef = coeff,highlight=20, pch=20)

Time for this code chunk: 2.38510274887085

Renaming columns and writing the table.

res <- merge(res,gpl,by.x=0,by.y="ID")

colnames(res) <- str_c(colnames(res),"_",gse5)
colnames(res)
##  [1] "Row.names_GSE47460"            "logFC_GSE47460"               
##  [3] "CI.L_GSE47460"                 "CI.R_GSE47460"                
##  [5] "AveExpr_GSE47460"              "t_GSE47460"                   
##  [7] "P.Value_GSE47460"              "adj.P.Val_GSE47460"           
##  [9] "B_GSE47460"                    "SPOT_ID_GSE47460"             
## [11] "CONTROL_TYPE_GSE47460"         "REFSEQ_GSE47460"              
## [13] "GB_ACC_GSE47460"               "GENE_GSE47460"                
## [15] "GENE_SYMBOL_GSE47460"          "GENE_NAME_GSE47460"           
## [17] "UNIGENE_ID_GSE47460"           "ENSEMBL_ID_GSE47460"          
## [19] "TIGR_ID_GSE47460"              "ACCESSION_STRING_GSE47460"    
## [21] "CHROMOSOMAL_LOCATION_GSE47460" "CYTOBAND_GSE47460"            
## [23] "DESCRIPTION_GSE47460"          "GO_ID_GSE47460"               
## [25] "SEQUENCE_GSE47460"
write_csv(res,
          path=str_c(OUTPUT_DIR,"/TableGenes_",gse5,"_",TODAY,".csv")
          )

Time for this code chunk: 1.09140872955322

GSE57148

The authors measured lung tissue expression from 98 COPD patients and 91 controls with normal spirometry. They are all patients with cancer nodes, smokers and male.

This experiment is RNAseq, the data will be download from Recount2. I’m following Recount2 vignette

Recount2 data

We downloaded counts from Recount2

gse6<- rownames(gse_table)[6] 

## Download data from Recount2
url <- download_study('SRP041538',outdir = file.path(DATA_DIR,'SRP041538'))
## 2020-07-09 08:33:43 downloading file rse_gene.Rdata to /home/ana/R-projects/Meta-analysis_COPD/data/SRP041538
load(file.path(DATA_DIR, 'SRP041538', 'rse_gene.Rdata'))

## Scale counts by taking into account the total coverage per sample
rse <- scale_counts(rse_gene)
rse
## class: RangedSummarizedExperiment 
## dim: 58037 187 
## metadata(0):
## assays(1): counts
## rownames(58037): ENSG00000000003.14 ENSG00000000005.5 ...
##   ENSG00000283698.1 ENSG00000283699.1
## rowData names(3): gene_id bp_length symbol
## colnames(187): SRR1265629 SRR1265647 ... SRR1265529 SRR1265505
## colData names(21): project sample ... title characteristics

Time for this code chunk: 1.37412118911743

Get annotation

Also sample annotation is taken from Recount following its vignette.

## Sample annotation
geochar <- lapply(split(colData(rse_gene), seq_len(nrow(colData(rse_gene)))),geo_characteristics)

geochar <- do.call(rbind, lapply(geochar, function(x) {
    if('cells' %in% colnames(x)) {
        colnames(x)[colnames(x) == 'cells'] <- 'cell.line'
        return(x)
    } else {
        return(x)
    }
}))

table(geochar$disease.state)
## 
##   COPD Normal 
##     96     91

Time for this code chunk: 1.2140953540802

Select column with COPD description

Then, we select disease.state information for calculating a differential expression analysis.

## Add sample information for DE analysis
colData(rse)$group <-factor(geochar$disease.state, levels = c("Normal","COPD"))
#write.csv(assay(rse),str_c("data/normData/","GSE57148","_normData.txt"),quote=F)

Time for this code chunk: 0.0408027172088623

Differential expression analysis

Using DESeq2package we calculated DEG.

## Specify design and switch to DESeq2 format
 dds <- DESeqDataSet(rse, ~ group)
## converting counts to integer mode
 ## Perform DE analysis
 dds <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 1439 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
 res <- results(dds)
 
 # Calculates de CI
 res$error <- qnorm(0.975)*res$lfcSE
 res$CI.L <- res$log2FoldChange-res$error
 res$CI.R <- res$log2FoldChange+res$error
 
 res
## log2 fold change (MLE): group COPD vs Normal 
## Wald test p-value: group COPD vs Normal 
## DataFrame with 58037 rows and 9 columns
##                      baseMean log2FoldChange     lfcSE       stat      pvalue
##                     <numeric>      <numeric> <numeric>  <numeric>   <numeric>
## ENSG00000000003.14  924.46223     -0.1311028 0.0763894  -1.716244   0.0861174
## ENSG00000000005.5     2.85737     -0.4396692 0.4629495  -0.949713   0.3422580
## ENSG00000000419.12 1089.25569      0.0152275 0.0300546   0.506663   0.6123911
## ENSG00000000457.13  698.75730     -0.0656469 0.0405149  -1.620314   0.1051648
## ENSG00000000460.16  347.79272     -0.0502240 0.0465180  -1.079669   0.2802896
## ...                       ...            ...       ...        ...         ...
## ENSG00000283695.1   0.0454355     -0.1779953 1.9047251 -0.0934493 9.25547e-01
## ENSG00000283696.1  37.5449930      0.5120456 0.0933836  5.4832482 4.17586e-08
## ENSG00000283697.1  21.2156969      0.0782234 0.1087300  0.7194274 4.71878e-01
## ENSG00000283698.1   0.1327733     -0.1294404 0.9741244 -0.1328787 8.94289e-01
## ENSG00000283699.1   0.0646975      0.1592290 1.3843041  0.1150246 9.08426e-01
##                           padj     error       CI.L      CI.R
##                      <numeric> <numeric>  <numeric> <numeric>
## ENSG00000000003.14    0.177940 0.1497204 -0.2808232 0.0186176
## ENSG00000000005.5     0.502138 0.9073642 -1.3470334 0.4676951
## ENSG00000000419.12    0.743547 0.0589059 -0.0436783 0.0741334
## ENSG00000000457.13    0.207758 0.0794078 -0.1450547 0.0137609
## ENSG00000000460.16    0.435330 0.0911736 -0.1413976 0.0409495
## ...                        ...       ...        ...       ...
## ENSG00000283695.1           NA  3.733193  -3.911188  3.555197
## ENSG00000283696.1  5.80204e-07  0.183029   0.329017  0.695074
## ENSG00000283697.1  6.26366e-01  0.213107  -0.134884  0.291330
## ENSG00000283698.1           NA  1.909249  -2.038689  1.779808
## ENSG00000283699.1           NA  2.713186  -2.553957  2.872415
 ## Extract Gencode gene ids
 gencode <- gsub('\\..*', '', names(recount_genes))
 
## Find the gene information we are interested in
gene_info <- AnnotationDbi::select(org.Hs.eg.db, gencode, c('SYMBOL', 'ENSEMBL'), 'ENSEMBL')
## 'select()' returned many:many mapping between keys and columns
r <- as_tibble(res, rownames="rownames")
r$rownames <- gsub("\\..*","",r$rownames)
r <- full_join(r,gene_info, by=c("rownames"="ENSEMBL")) 

colnames(r) <- str_c(colnames(r),"_",gse6)
colnames(r)
##  [1] "rownames_GSE57148"       "baseMean_GSE57148"      
##  [3] "log2FoldChange_GSE57148" "lfcSE_GSE57148"         
##  [5] "stat_GSE57148"           "pvalue_GSE57148"        
##  [7] "padj_GSE57148"           "error_GSE57148"         
##  [9] "CI.L_GSE57148"           "CI.R_GSE57148"          
## [11] "SYMBOL_GSE57148"
write_csv(r,
          path=str_c(OUTPUT_DIR,"/TableGenes_",gse6,"_",TODAY,".csv"))

Time for this code chunk: 1.84871580998103

GSE8581

The experiment aims to find biomarkers to help in an early diagnotic. The authors used lung tissue, they had 18 smokers with nodules suspicious for lung cancer as a controls, and definded 15 COPD patients and 23 individuals that are in the middle. The criteria was COPD: COPD = FEV1<70%, FEV1/FVC<0.7 and for controls: FEV1 > 80% predicted and FEV1/FVC > 0.7.

The authors measured data using [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array.

In GEOquery annotation, we found 19 Controls, 16 COPD and 23 Unclassifed.

Pre-process raw data

We pre-processed raw data using the function rawCEL_normCEL, plots will be shown as additional output.

gse7<- rownames(gse_table)[7] 
norm7 <- rawCEL_normCEL(gse7)
## Platform design info loaded.
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210004.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210005.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210006.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210007.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210008.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210009.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210010.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210011.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210012.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210014.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210015.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210071.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210087.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210090.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210188.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210192.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210193.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210194.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210196.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210978.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210979.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210992.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210993.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM210994.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM211007.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM211008.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM211009.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM211010.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM211865.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM211872.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212067.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212068.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212069.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212070.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212074.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212075.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212787.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212788.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212789.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212790.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212809.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212810.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212811.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212848.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212849.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212850.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212852.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212853.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212854.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM212855.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213017.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213018.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213019.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213020.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213034.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213035.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213036.CEL.gz
## Reading in : /home/ana/R-projects/Meta-analysis_COPD/data/celfiles/GSE8581/GSM213037.CEL.gz

## Background correcting
## Normalizing
## Calculating Expression

norm7
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54675 features, 58 samples 
##   element names: exprs 
## protocolData
##   rowNames: GSM210004.CEL.gz GSM210005.CEL.gz ... GSM213037.CEL.gz (58
##     total)
##   varLabels: exprs dates
##   varMetadata: labelDescription channel
## phenoData
##   rowNames: GSM210004.CEL.gz GSM210005.CEL.gz ... GSM213037.CEL.gz (58
##     total)
##   varLabels: index
##   varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.hg.u133.plus.2

Time for this code chunk: 52.6767470836639

Get annotation

We used GEOquery package to obtain sample annotations and our previous calculated pre-processed values to create an ExpressionSet object.

# get annotation using GEOquery package
geo7 <- get_GEO(gse7,norm7)
## Found 1 file(s)
## GSE8581_series_matrix.txt.gz
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   ID_REF = col_character()
## )
## See spec(...) for full column specifications.
## Using locally cached version of GPL570 found here:
## /tmp/Rtmp4ZNwux/GPL570.soft
## Data downloaded from GEOquery:
## $GSE8581_series_matrix.txt.gz
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54675 features, 58 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM210004 GSM210005 ... GSM213037 (58 total)
##   varLabels: title geo_accession ... Race:ch1 (46 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 18849563 
## Annotation: GPL570
## Colnames of GEOquery object:
## [1] "GSM210004" "GSM210005" "GSM210006" "GSM210007" "GSM210008"
## Colnames of calculated pre-processed data:
## [1] "GSM210004.CEL.gz" "GSM210005.CEL.gz" "GSM210006.CEL.gz" "GSM210007.CEL.gz"
## [5] "GSM210008.CEL.gz"

Time for this code chunk: 12.3359503746033

Select column with COPD description

Each experiment has its own annotation and we needed to look for a column describing which sample is a “Control” and which one is “COPD”.

head(pData(geo7))
##                              title geo_accession                status
## GSM210004         Human_COPD_Case1     GSM210004 Public on May 31 2008
## GSM210005           Human_Control1     GSM210005 Public on May 31 2008
## GSM210006         Human_COPD_Case2     GSM210006 Public on May 31 2008
## GSM210007 Human_Lung_Unclassified1     GSM210007 Public on May 31 2008
## GSM210008         Human_COPD_Case3     GSM210008 Public on May 31 2008
## GSM210009           Human_Control2     GSM210009 Public on May 31 2008
##           submission_date last_update_date type channel_count source_name_ch1
## GSM210004     Jul 12 2007      Aug 28 2018  RNA             1      Whole lung
## GSM210005     Jul 12 2007      Aug 28 2018  RNA             1      Whole Lung
## GSM210006     Jul 12 2007      Aug 28 2018  RNA             1      Whole Lung
## GSM210007     Jul 12 2007      Aug 28 2018  RNA             1      Whole Lung
## GSM210008     Jul 12 2007      Aug 28 2018  RNA             1      Whole Lung
## GSM210009     Jul 12 2007      Aug 28 2018  RNA             1      Whole Lung
##           organism_ch1
## GSM210004 Homo sapiens
## GSM210005 Homo sapiens
## GSM210006 Homo sapiens
## GSM210007 Homo sapiens
## GSM210008 Homo sapiens
## GSM210009 Homo sapiens
##                                                     characteristics_ch1
## GSM210004         Race: Caucasian, Age: 63, Gender: Male, Height: 72in.
## GSM210005 Race: AfricanAmerican, Gender: Female, Age: 84, Height: 60in.
## GSM210006 Race: AfricanAmerican, Gender: Female, Age: 65, Height: 66in.
## GSM210007         Race: Caucasian, Age: 46, Gender: Male, Height: 66in.
## GSM210008       Race: Caucasian, Age: 53, Gender: Female, Height: 65in.
## GSM210009       Race: Caucasian, Age: 60, Gender: Female, Height: 64in.
##           characteristics_ch1.1 biomaterial_provider_ch1 molecule_ch1
## GSM210004                                    Mariani Lab    total RNA
## GSM210005                                    Mariani Lab    total RNA
## GSM210006                                    Mariani Lab    total RNA
## GSM210007                                    Mariani Lab    total RNA
## GSM210008                                    Mariani Lab    total RNA
## GSM210009                                    Mariani Lab    total RNA
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       extract_protocol_ch1
## GSM210004  For each experimental sample, RNA quality was assessed by RNA Nano LabChip analysis on an Agilent Bioanalyzer 2100. Concentrations may also be determined using a NanoDrop 1000 spectrophotometer. Under standard conditions processing of RNAs for GeneChip Analysis was in accordance with methods described in the Affymetrix GeneChip Expression Analysis Technical Manual, revision four, as subsequently detailed.   Synthesis of cDNA first and second strand is performed using the GeneChip Expression 3’-Amplification Reagents One-Cycle cDNA Synthesis Kit (P/N 900431).  Cleanup of the double stranded product is carried according to standard Affymetrix protocols using the Affymetrix GeneChip Cleanup Module (Affymetrix Catalog # 900371). 
## GSM210005   For each experimental sample, RNA quality was assessed by RNA Nano LabChip analysis on an Agilent Bioanalyzer 2100. Concentrations may also be determined using a NanoDrop 1000 spectrophotometer. Under standard conditions processing of RNAs for GeneChip Analysis was in accordance with methods described in the Affymetrix GeneChip Expression Analysis Technical Manual, revision four, as subsequently detailed.   Synthesis of cDNA first and second strand is performed using the GeneChip Expression 3’-Amplification Reagents One-Cycle cDNA Synthesis Kit (P/N 900431).  Cleanup of the double stranded product is carried according to standard Affymetrix protocols using the Affymetrix GeneChip Cleanup Module (Affymetrix Catalog # 900371).
## GSM210006 For each experimental sample, RNA quality was assessed by RNA Nano LabChip analysis on an Agilent Bioanalyzer 2100. Concentrations may also be determined using a NanoDrop 1000 spectrophotometer. Under standard conditions processing of RNAs for GeneChip Analysis was in accordance with methods described in the Affymetrix GeneChip Expression Analysis Technical Manual, revision four, as subsequently detailed.   Synthesis of cDNA first and second strand is performed using the GeneChip Expression 3’-Amplification Reagents One-Cycle cDNA Synthesis Kit (P/N 900431).  Cleanup of the double stranded product is carried according to standard Affymetrix protocols using the Affymetrix GeneChip Cleanup Module (Affymetrix Catalog # 900371).  
## GSM210007 For each experimental sample, RNA quality was assessed by RNA Nano LabChip analysis on an Agilent Bioanalyzer 2100. Concentrations may also be determined using a NanoDrop 1000 spectrophotometer. Under standard conditions processing of RNAs for GeneChip Analysis was in accordance with methods described in the Affymetrix GeneChip Expression Analysis Technical Manual, revision four, as subsequently detailed.   Synthesis of cDNA first and second strand is performed using the GeneChip Expression 3’-Amplification Reagents One-Cycle cDNA Synthesis Kit (P/N 900431).  Cleanup of the double stranded product is carried according to standard Affymetrix protocols using the Affymetrix GeneChip Cleanup Module (Affymetrix Catalog # 900371).  
## GSM210008 For each experimental sample, RNA quality was assessed by RNA Nano LabChip analysis on an Agilent Bioanalyzer 2100. Concentrations may also be determined using a NanoDrop 1000 spectrophotometer. Under standard conditions processing of RNAs for GeneChip Analysis was in accordance with methods described in the Affymetrix GeneChip Expression Analysis Technical Manual, revision four, as subsequently detailed.   Synthesis of cDNA first and second strand is performed using the GeneChip Expression 3’-Amplification Reagents One-Cycle cDNA Synthesis Kit (P/N 900431).  Cleanup of the double stranded product is carried according to standard Affymetrix protocols using the Affymetrix GeneChip Cleanup Module (Affymetrix Catalog # 900371).  
## GSM210009   For each experimental sample, RNA quality was assessed by RNA Nano LabChip analysis on an Agilent Bioanalyzer 2100. Concentrations may also be determined using a NanoDrop 1000 spectrophotometer. Under standard conditions processing of RNAs for GeneChip Analysis was in accordance with methods described in the Affymetrix GeneChip Expression Analysis Technical Manual, revision four, as subsequently detailed.   Synthesis of cDNA first and second strand is performed using the GeneChip Expression 3’-Amplification Reagents One-Cycle cDNA Synthesis Kit (P/N 900431).  Cleanup of the double stranded product is carried according to standard Affymetrix protocols using the Affymetrix GeneChip Cleanup Module (Affymetrix Catalog # 900371).
##           label_ch1
## GSM210004    Biotin
## GSM210005    Biotin
## GSM210006    Biotin
## GSM210007    Biotin
## GSM210008    Biotin
## GSM210009    Biotin
##                                                                                                                                                                                                                                                                             label_protocol_ch1
## GSM210004 In vitro transcription (IVT) is performed using the GeneChip Expression Amplification Reagents kit- 30 reactions (P/N 900449) and is carried out according to the standard Affymetrix protocols and quantification of the IVT samples is carried out on a Bio-Tek UV Plate Reader.  
## GSM210005    In vitro transcription (IVT) is performed using the GeneChip Expression Amplification Reagents kit- 30 reactions (P/N 900449) and is carried out according to the standard Affymetrix protocols and quantification of the IVT samples is carried out on a Bio-Tek UV Plate Reader
## GSM210006 In vitro transcription (IVT) is performed using the GeneChip Expression Amplification Reagents kit- 30 reactions (P/N 900449) and is carried out according to the standard Affymetrix protocols and quantification of the IVT samples is carried out on a Bio-Tek UV Plate Reader.  
## GSM210007 In vitro transcription (IVT) is performed using the GeneChip Expression Amplification Reagents kit- 30 reactions (P/N 900449) and is carried out according to the standard Affymetrix protocols and quantification of the IVT samples is carried out on a Bio-Tek UV Plate Reader.  
## GSM210008 In vitro transcription (IVT) is performed using the GeneChip Expression Amplification Reagents kit- 30 reactions (P/N 900449) and is carried out according to the standard Affymetrix protocols and quantification of the IVT samples is carried out on a Bio-Tek UV Plate Reader.  
## GSM210009   In vitro transcription (IVT) is performed using the GeneChip Expression Amplification Reagents kit- 30 reactions (P/N 900449) and is carried out according to the standard Affymetrix protocols and quantification of the IVT samples is carried out on a Bio-Tek UV Plate Reader.
##           taxid_ch1
## GSM210004      9606
## GSM210005      9606
## GSM210006      9606
## GSM210007      9606
## GSM210008      9606
## GSM210009      9606
##                                                                                                                                                                                                                                                                           hyb_protocol
## GSM210004 Hybridization is carried out according the Affymetrix GeneChip® Manual. Twenty micrograms of IVT material is the nominal amount used on the GeneChip® arrays.  Affymetrix hybridization ovens are used to incubate the arrays at a constant temperature of 45oC overnight.  
## GSM210005 Hybridization is carried out according the Affymetrix GeneChip® Manual. Twenty micrograms of IVT material is the nominal amount used on the GeneChip® arrays.  Affymetrix hybridization ovens are used to incubate the arrays at a constant temperature of 45oC overnight.  
## GSM210006 Hybridization is carried out according the Affymetrix GeneChip® Manual. Twenty micrograms of IVT material is the nominal amount used on the GeneChip® arrays.  Affymetrix hybridization ovens are used to incubate the arrays at a constant temperature of 45oC overnight.  
## GSM210007 Hybridization is carried out according the Affymetrix GeneChip® Manual. Twenty micrograms of IVT material is the nominal amount used on the GeneChip® arrays.  Affymetrix hybridization ovens are used to incubate the arrays at a constant temperature of 45oC overnight.  
## GSM210008 Hybridization is carried out according the Affymetrix GeneChip® Manual. Twenty micrograms of IVT material is the nominal amount used on the GeneChip® arrays.  Affymetrix hybridization ovens are used to incubate the arrays at a constant temperature of 45oC overnight.  
## GSM210009 Hybridization is carried out according the Affymetrix GeneChip® Manual. Twenty micrograms of IVT material is the nominal amount used on the GeneChip® arrays.  Affymetrix hybridization ovens are used to incubate the arrays at a constant temperature of 45oC overnight.  
##                                                                                                                                                                                                                                                                          hyb_protocol.1
## GSM210004 Preparation of microarrays for scanning is carried out with Affymetrix appropriate wash protocols matched to the specific chip type on a Model 450 Fluidics station.  Affymetrix GeneChip Operating Software (GCOS) operating system controls the Fluidics station process.  
## GSM210005 Preparation of microarrays for scanning is carried out with Affymetrix appropriate wash protocols matched to the specific chip type on a Model 450 Fluidics station.  Affymetrix GeneChip Operating Software (GCOS) operating system controls the Fluidics station process.  
## GSM210006 Preparation of microarrays for scanning is carried out with Affymetrix appropriate wash protocols matched to the specific chip type on a Model 450 Fluidics station.  Affymetrix GeneChip Operating Software (GCOS) operating system controls the Fluidics station process.  
## GSM210007 Preparation of microarrays for scanning is carried out with Affymetrix appropriate wash protocols matched to the specific chip type on a Model 450 Fluidics station.  Affymetrix GeneChip Operating Software (GCOS) operating system controls the Fluidics station process.  
## GSM210008 Preparation of microarrays for scanning is carried out with Affymetrix appropriate wash protocols matched to the specific chip type on a Model 450 Fluidics station.  Affymetrix GeneChip Operating Software (GCOS) operating system controls the Fluidics station process.  
## GSM210009   Preparation of microarrays for scanning is carried out with Affymetrix appropriate wash protocols matched to the specific chip type on a Model 450 Fluidics station.  Affymetrix GeneChip Operating Software (GCOS) operating system controls the Fluidics station process.
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          scan_protocol
## GSM210004 Scanning is carried out on a GeneChip® Scanner 3000 7G scanner with autoloader. The Affymetrix GCOS v1.3 operating system controls the Model 3000 7G scanner and data acquisition functions. GCOS maintains the mediated first level data analysis and desktop data management for the entire GeneChip System. Chip library files specific to each array and necessary for scan interpretation are stored on the computer workstation controlling the scanner and are updated regularly as necessary when updates are made available from Affymetrix. Collected research data is stored on the hard drive of the instrument computer, transferred to a mirrored storage disk and to  a raid 5 server operating on the Partners Healthcare network and backed up nightly via the Partners Healthcare Systems IT backup utility.  All systems are CFR21-11 and HIPPA compliant
## GSM210005 Scanning is carried out on a GeneChip® Scanner 3000 7G scanner with autoloader. The Affymetrix GCOS v1.3 operating system controls the Model 3000 7G scanner and data acquisition functions. GCOS maintains the mediated first level data analysis and desktop data management for the entire GeneChip System. Chip library files specific to each array and necessary for scan interpretation are stored on the computer workstation controlling the scanner and are updated regularly as necessary when updates are made available from Affymetrix. Collected research data is stored on the hard drive of the instrument computer, transferred to a mirrored storage disk and to  a raid 5 server operating on the Partners Healthcare network and backed up nightly via the Partners Healthcare Systems IT backup utility.  All systems are CFR21-11 and HIPPA compliant
## GSM210006 Scanning is carried out on a GeneChip® Scanner 3000 7G scanner with autoloader. The Affymetrix GCOS v1.3 operating system controls the Model 3000 7G scanner and data acquisition functions. GCOS maintains the mediated first level data analysis and desktop data management for the entire GeneChip System. Chip library files specific to each array and necessary for scan interpretation are stored on the computer workstation controlling the scanner and are updated regularly as necessary when updates are made available from Affymetrix. Collected research data is stored on the hard drive of the instrument computer, transferred to a mirrored storage disk and to  a raid 5 server operating on the Partners Healthcare network and backed up nightly via the Partners Healthcare Systems IT backup utility.  All systems are CFR21-11 and HIPPA compliant
## GSM210007 Scanning is carried out on a GeneChip® Scanner 3000 7G scanner with autoloader. The Affymetrix GCOS v1.3 operating system controls the Model 3000 7G scanner and data acquisition functions. GCOS maintains the mediated first level data analysis and desktop data management for the entire GeneChip System. Chip library files specific to each array and necessary for scan interpretation are stored on the computer workstation controlling the scanner and are updated regularly as necessary when updates are made available from Affymetrix. Collected research data is stored on the hard drive of the instrument computer, transferred to a mirrored storage disk and to  a raid 5 server operating on the Partners Healthcare network and backed up nightly via the Partners Healthcare Systems IT backup utility.  All systems are CFR21-11 and HIPPA compliant
## GSM210008 Scanning is carried out on a GeneChip® Scanner 3000 7G scanner with autoloader. The Affymetrix GCOS v1.3 operating system controls the Model 3000 7G scanner and data acquisition functions. GCOS maintains the mediated first level data analysis and desktop data management for the entire GeneChip System. Chip library files specific to each array and necessary for scan interpretation are stored on the computer workstation controlling the scanner and are updated regularly as necessary when updates are made available from Affymetrix. Collected research data is stored on the hard drive of the instrument computer, transferred to a mirrored storage disk and to  a raid 5 server operating on the Partners Healthcare network and backed up nightly via the Partners Healthcare Systems IT backup utility.  All systems are CFR21-11 and HIPPA compliant
## GSM210009 Scanning is carried out on a GeneChip® Scanner 3000 7G scanner with autoloader. The Affymetrix GCOS v1.3 operating system controls the Model 3000 7G scanner and data acquisition functions. GCOS maintains the mediated first level data analysis and desktop data management for the entire GeneChip System. Chip library files specific to each array and necessary for scan interpretation are stored on the computer workstation controlling the scanner and are updated regularly as necessary when updates are made available from Affymetrix. Collected research data is stored on the hard drive of the instrument computer, transferred to a mirrored storage disk and to  a raid 5 server operating on the Partners Healthcare network and backed up nightly via the Partners Healthcare Systems IT backup utility.  All systems are CFR21-11 and HIPPA compliant
##                                                                                                                                                                                                                                                       scan_protocol.1
## GSM210004 Please take note: this material is available only for use in support of grant applications or publications by PHS investigators intending to use or using  the HPCGG services.  Any other use of this material without express authorization is prohibited.
## GSM210005 Please take note: this material is available only for use in support of grant applications or publications by PHS investigators intending to use or using  the HPCGG services.  Any other use of this material without express authorization is prohibited.
## GSM210006 Please take note: this material is available only for use in support of grant applications or publications by PHS investigators intending to use or using  the HPCGG services.  Any other use of this material without express authorization is prohibited.
## GSM210007 Please take note: this material is available only for use in support of grant applications or publications by PHS investigators intending to use or using  the HPCGG services.  Any other use of this material without express authorization is prohibited.
## GSM210008 Please take note: this material is available only for use in support of grant applications or publications by PHS investigators intending to use or using  the HPCGG services.  Any other use of this material without express authorization is prohibited.
## GSM210009                                                                                                                                                                                                                                                            
##                   scan_protocol.2
## GSM210004 HPCGG Operations, 2006.
## GSM210005 HPCGG Operations, 2006.
## GSM210006 HPCGG Operations, 2006.
## GSM210007 HPCGG Operations, 2006.
## GSM210008 HPCGG Operations, 2006.
## GSM210009                        
##                                                           description
## GSM210004 FEV1:2.54, Ratio:58, Diagnosis: NSC Squamous, ArrayID: 610A
## GSM210005 FEV1:1.69, Ratio:83.66, Diagnosis: NSC Adeno, ArrayID: 610B
## GSM210006    FEV1:1.41, Ratio:51, Diagnosis: NSC Adeno, ArrayID: 610C
## GSM210007 FEV1:2.51, Ratio:80.96, Diagnosis: NSC Adeno, ArrayID: 610D
## GSM210008    FEV1:1.64, Ratio:57, Diagnosis: NSC Adeno, ArrayId: 610E
## GSM210009   FEV1:2.72, Ratio:74, Diagnosis: NSC Squamo, ArrayID: 610F
##           description.1 data_processing platform_id             contact_name
## GSM210004                          Mas5      GPL570 Soumyaroop,,Bhattacharya
## GSM210005                          Mas5      GPL570 Soumyaroop,,Bhattacharya
## GSM210006                          Mas5      GPL570 Soumyaroop,,Bhattacharya
## GSM210007                          Mas5      GPL570 Soumyaroop,,Bhattacharya
## GSM210008                          Mas5      GPL570 Soumyaroop,,Bhattacharya
## GSM210009                          Mas5      GPL570 Soumyaroop,,Bhattacharya
##                                        contact_email contact_phone  contact_fax
## GSM210004 Soumyaroop_Bhattacharya@URMC.rochester.edu  585-276-4683 585-276-2642
## GSM210005 Soumyaroop_Bhattacharya@URMC.rochester.edu  585-276-4683 585-276-2642
## GSM210006 Soumyaroop_Bhattacharya@URMC.rochester.edu  585-276-4683 585-276-2642
## GSM210007 Soumyaroop_Bhattacharya@URMC.rochester.edu  585-276-4683 585-276-2642
## GSM210008 Soumyaroop_Bhattacharya@URMC.rochester.edu  585-276-4683 585-276-2642
## GSM210009 Soumyaroop_Bhattacharya@URMC.rochester.edu  585-276-4683 585-276-2642
##           contact_laboratory contact_department
## GSM210004            Mariani         Pediatrics
## GSM210005            Mariani         Pediatrics
## GSM210006            Mariani         Pediatrics
## GSM210007            Mariani         Pediatrics
## GSM210008            Mariani         Pediatrics
## GSM210009            Mariani         Pediatrics
##                                contact_institute            contact_address
## GSM210004 University of Rochester Medical Center 601 Elmwood Avenue Box 850
## GSM210005 University of Rochester Medical Center 601 Elmwood Avenue Box 850
## GSM210006 University of Rochester Medical Center 601 Elmwood Avenue Box 850
## GSM210007 University of Rochester Medical Center 601 Elmwood Avenue Box 850
## GSM210008 University of Rochester Medical Center 601 Elmwood Avenue Box 850
## GSM210009 University of Rochester Medical Center 601 Elmwood Avenue Box 850
##           contact_city contact_state contact_zip/postal_code contact_country
## GSM210004    Rochester            NY                   14642             USA
## GSM210005    Rochester            NY                   14642             USA
## GSM210006    Rochester            NY                   14642             USA
## GSM210007    Rochester            NY                   14642             USA
## GSM210008    Rochester            NY                   14642             USA
## GSM210009    Rochester            NY                   14642             USA
##                                    contact_web_link
## GSM210004 http://lungtranscriptome.bwh.harvard.edu/
## GSM210005 http://lungtranscriptome.bwh.harvard.edu/
## GSM210006 http://lungtranscriptome.bwh.harvard.edu/
## GSM210007 http://lungtranscriptome.bwh.harvard.edu/
## GSM210008 http://lungtranscriptome.bwh.harvard.edu/
## GSM210009 http://lungtranscriptome.bwh.harvard.edu/
##                                                                          supplementary_file
## GSM210004 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210004/suppl/GSM210004.CEL.gz
## GSM210005 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210005/suppl/GSM210005.CEL.gz
## GSM210006 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210006/suppl/GSM210006.CEL.gz
## GSM210007 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210007/suppl/GSM210007.CEL.gz
## GSM210008 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210008/suppl/GSM210008.CEL.gz
## GSM210009 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210009/suppl/GSM210009.CEL.gz
##                                                                        supplementary_file.1
## GSM210004 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210004/suppl/GSM210004.CHP.gz
## GSM210005 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210005/suppl/GSM210005.CHP.gz
## GSM210006 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210006/suppl/GSM210006.CHP.gz
## GSM210007 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210007/suppl/GSM210007.CHP.gz
## GSM210008 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210008/suppl/GSM210008.CHP.gz
## GSM210009 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM210nnn/GSM210009/suppl/GSM210009.CHP.gz
##           data_row_count                 relation               relation.1
## GSM210004          54675  Reanalyzed by: GSE60486 Reanalyzed by: GSE119087
## GSM210005          54675  Reanalyzed by: GSE60486 Reanalyzed by: GSE119087
## GSM210006          54675  Reanalyzed by: GSE60486 Reanalyzed by: GSE119087
## GSM210007          54675 Reanalyzed by: GSE119087                         
## GSM210008          54675  Reanalyzed by: GSE60486 Reanalyzed by: GSE119087
## GSM210009          54675  Reanalyzed by: GSE60486 Reanalyzed by: GSE119087
##           Age:ch1                                                Race:ch1
## GSM210004    <NA>         Caucasian, Age: 63, Gender: Male, Height: 72in.
## GSM210005    <NA> AfricanAmerican, Gender: Female, Age: 84, Height: 60in.
## GSM210006    <NA> AfricanAmerican, Gender: Female, Age: 65, Height: 66in.
## GSM210007    <NA>         Caucasian, Age: 46, Gender: Male, Height: 66in.
## GSM210008    <NA>       Caucasian, Age: 53, Gender: Female, Height: 65in.
## GSM210009    <NA>       Caucasian, Age: 60, Gender: Female, Height: 64in.

Time for this code chunk: 0.0283792018890381

Names will be different but it is important to check that “Control” group is the first level. If need it re-level groups.

pData(geo7)["Disease"] <- factor(str_remove_all(pData(geo7)[,"title"], "[0-9]"))

table(pData(geo7)$Disease)
## 
##           Human_Control          Human_Controla         Human_COPD_Case 
##                      18                       1                      15 
##        Human_COPD_CaseX Human_Lung_Unclassified 
##                       1                      23

Time for this code chunk: 0.104738712310791

Differential expression analysis

Using DE() function (described above), we performed a lineal regression model to calculate the logarithm fold change of all genes between a “Control” and a “COPD” group. We also rename colnames adding the GSE ID at the end and finally, we save the output in a .CSV file.

de7 <- DE(geo7,coeff = 3)
## [1] "(Intercept)"                    "DiseaseHuman_Controla"         
## [3] "DiseaseHuman_COPD_Case"         "DiseaseHuman_COPD_CaseX"       
## [5] "DiseaseHuman_Lung_Unclassified"

colnames(de7) <- str_c(colnames(de7),"_",gse7)
colnames(de7)
##  [1] "rownames_GSE8581"                        
##  [2] "ID_GSE8581"                              
##  [3] "GB_ACC_GSE8581"                          
##  [4] "SPOT_ID_GSE8581"                         
##  [5] "Species.Scientific.Name_GSE8581"         
##  [6] "Annotation.Date_GSE8581"                 
##  [7] "Sequence.Type_GSE8581"                   
##  [8] "Sequence.Source_GSE8581"                 
##  [9] "Target.Description_GSE8581"              
## [10] "Representative.Public.ID_GSE8581"        
## [11] "Gene.Title_GSE8581"                      
## [12] "Gene.Symbol_GSE8581"                     
## [13] "ENTREZ_GENE_ID_GSE8581"                  
## [14] "RefSeq.Transcript.ID_GSE8581"            
## [15] "Gene.Ontology.Biological.Process_GSE8581"
## [16] "Gene.Ontology.Cellular.Component_GSE8581"
## [17] "Gene.Ontology.Molecular.Function_GSE8581"
## [18] "logFC_GSE8581"                           
## [19] "CI.L_GSE8581"                            
## [20] "CI.R_GSE8581"                            
## [21] "AveExpr_GSE8581"                         
## [22] "t_GSE8581"                               
## [23] "P.Value_GSE8581"                         
## [24] "adj.P.Val_GSE8581"                       
## [25] "B_GSE8581"
write_csv(de7,
          path=str_c(OUTPUT_DIR,"/TableGenes_",gse7,"_",TODAY,".csv")
          )

Time for this code chunk: 3.5228488445282

Output

This script produces the following data, and can be found in /home/ana/R-projects/Meta-analysis_COPD

Tables with DE results: Tables with log fold change and p-values calculated
Table of merged results: Table with all DE results

Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] pd.hg.u133.plus.2_3.12.0    pd.huex.1.0.st.v2_3.14.1   
##  [3] pd.hg.u133a_3.12.0          pd.hu6800_3.12.0           
##  [5] DBI_1.1.0                   RSQLite_2.2.0              
##  [7] recount_1.14.0              org.Hs.eg.db_3.11.4        
##  [9] AnnotationDbi_1.50.1        DESeq2_1.28.1              
## [11] GEOquery_2.56.0             SummarizedExperiment_1.18.1
## [13] DelayedArray_0.14.0         matrixStats_0.56.0         
## [15] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
## [17] limma_3.44.3                forcats_0.5.0              
## [19] stringr_1.4.0               dplyr_1.0.0                
## [21] purrr_0.3.4                 readr_1.3.1                
## [23] tidyr_1.1.0                 tibble_3.0.2               
## [25] ggplot2_3.3.2               tidyverse_1.3.0            
## [27] oligo_1.52.0                Biostrings_2.56.0          
## [29] XVector_0.28.0              IRanges_2.22.2             
## [31] S4Vectors_0.26.1            Biobase_2.48.0             
## [33] oligoClasses_1.50.0         BiocGenerics_0.34.0        
## [35] knitr_1.29                 
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1             backports_1.1.8          Hmisc_4.4-0             
##   [4] BiocFileCache_1.12.0     plyr_1.8.6               splines_4.0.2           
##   [7] BiocParallel_1.22.0      digest_0.6.25            foreach_1.5.0           
##  [10] htmltools_0.5.0          fansi_0.4.1              magrittr_1.5            
##  [13] checkmate_2.0.0          memoise_1.1.0            BSgenome_1.56.0         
##  [16] cluster_2.1.0            annotate_1.66.0          modelr_0.1.8            
##  [19] askpass_1.1              prettyunits_1.1.1        jpeg_0.1-8.1            
##  [22] colorspace_1.4-1         blob_1.2.1               rvest_0.3.5             
##  [25] rappdirs_0.3.1           haven_2.3.1              xfun_0.15               
##  [28] crayon_1.3.4             RCurl_1.98-1.2           jsonlite_1.7.0          
##  [31] genefilter_1.70.0        survival_3.2-3           VariantAnnotation_1.34.0
##  [34] iterators_1.0.12         glue_1.4.1               gtable_0.3.0            
##  [37] zlibbioc_1.34.0          rentrez_1.2.2            scales_1.1.1            
##  [40] rngtools_1.5             derfinderHelper_1.22.0   derfinder_1.22.0        
##  [43] Rcpp_1.0.5               xtable_1.8-4             progress_1.2.2          
##  [46] htmlTable_2.0.1          bumphunter_1.30.0        foreign_0.8-80          
##  [49] bit_1.1-15.2             preprocessCore_1.50.0    Formula_1.2-3           
##  [52] htmlwidgets_1.5.1        httr_1.4.1               RColorBrewer_1.1-2      
##  [55] acepack_1.4.1            ellipsis_0.3.1           ff_2.2-14.2             
##  [58] pkgconfig_2.0.3          XML_3.99-0.4             nnet_7.3-14             
##  [61] dbplyr_1.4.4             locfit_1.5-9.4           reshape2_1.4.4          
##  [64] tidyselect_1.1.0         rlang_0.4.6              munsell_0.5.0           
##  [67] cellranger_1.1.0         tools_4.0.2              cli_2.0.2               
##  [70] downloader_0.4           generics_0.0.2           broom_0.5.6             
##  [73] evaluate_0.14            yaml_2.2.1               bit64_0.9-7             
##  [76] fs_1.4.2                 doRNG_1.8.2              nlme_3.1-148            
##  [79] xml2_1.3.2               biomaRt_2.44.1           BiocStyle_2.16.0        
##  [82] compiler_4.0.2           rstudioapi_0.11          curl_4.3                
##  [85] png_0.1-7                affyio_1.58.0            reprex_0.3.0            
##  [88] geneplotter_1.66.0       stringi_1.4.6            highr_0.8               
##  [91] GenomicFeatures_1.40.0   GenomicFiles_1.24.0      lattice_0.20-41         
##  [94] Matrix_1.2-18            vctrs_0.3.1              pillar_1.4.4            
##  [97] lifecycle_0.2.0          BiocManager_1.30.10      data.table_1.12.8       
## [100] bitops_1.0-6             qvalue_2.20.0            rtracklayer_1.48.0      
## [103] R6_2.4.1                 latticeExtra_0.6-29      gridExtra_2.3           
## [106] affxparser_1.60.0        codetools_0.2-16         assertthat_0.2.1        
## [109] openssl_1.4.2            withr_2.2.0              GenomicAlignments_1.24.0
## [112] Rsamtools_2.4.0          GenomeInfoDbData_1.2.3   hms_0.5.3               
## [115] grid_4.0.2               rpart_4.1-15             rmarkdown_2.3           
## [118] lubridate_1.7.9          base64enc_0.1-3

Time for this code chunk: 0.0573010444641113